Skip to content

Conversation

@jammychiou1
Copy link
Contributor

  • The floor() in floor((f + 127) >> 7) was somewhat unecessary as the usual semantic for the right-shift operator (>>) has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency.
  • The bound of f1'' is correct but the proof was misleading. The new proof should be clearer.

@jammychiou1 jammychiou1 force-pushed the decompose-explanation-fix branch from 0b47c08 to 9931203 Compare December 1, 2025 03:34
@jammychiou1
Copy link
Contributor Author

I did consider addressing #654 also in this PR, but I haven't come up with a great idea for that yet...

- The floor() in floor((f + 127) >> 7) was somewhat unecessary as
  the usual semantic for the right-shift operator (>>) has integer
  output anyway. Seeing as the right-shift operator is not used in other
  explanation comments, we decided to rewrite it as division by 2^7 for
  better consistency.
- The bound of f1'' is correct but the proof was misleading. The new
  proof should be clearer.

Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Only the AVX2 one is updated currently. The NEON one will also be
updated if this looks good.

Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
@jammychiou1 jammychiou1 force-pushed the decompose-explanation-fix branch from 0be3a84 to 326d731 Compare December 1, 2025 07:21
Comment on lines 77 to +85
* Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact
* for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲
* 1 / 4092.
* 1 / 4092, so (for example) f1' = B / 2 = 2046 is mapped to
*
* round(2046 * 1025 / 2^22) = round(2046 * (1 / 4092 - epsilon))
* = round(1 / 2 - epsilon') = 0,
*
* where epsilon = 1 / 4092 - 1025 / 2^22 and epsilon' = 2046 * eps are both
* tiny but positive numbers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-becker May I have your review on this explanation? If this looks good, I plan to also apply this to the aarch64 implementation and resolve #654. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jammychiou1 Apologies for the slow reply.

I think if we want to fix #654 we should provide a general explanation.

Here's an attempt:

The approximation error for `f1' / B ≈ f1' * 1025 / 2^22` is `f1' * (1025/2^22 - 1/B)`. 
For `eps := 1025/2^22 - 1/B` we have `eps = 1/4290772992 ~ 2^{-31.99} < 2^{-31}`. 
Hence `|f1'| * eps < 2^{-15}`. Given `B = 4092 ~ 2^12`, we thus have `|f1'| * eps < 1/B`. 
On the other hand, `1/B` is the spacing between the integral multiples of `1/B`, which
includes all rounding boundaries `n + 1/2` (since `B` is even). Hence, if `f1' / B` is not 
of the form `n + 1/2`, then moving from `f1' / B` to `f1' * 1025 / 2^22` does not cross 
a rounding boundary, and hence `round(f1' / B) = round(f1' * 1025 / 2^22)`, and it doesn't
matter on either side which version of rounding one uses. 
If `f1' / B` _is_ of the form `n + 1/2`, then `f1' * 1025 / 2^22` is slightly below it 
(and _not_ a multiple of `n + 1/2`), hence `round-(f1' / B) = round(f1' * 1025 / 2^22)`; 
where the round-down on the LHS is essential, and on the RHS the type of rounding 
again does not matter.

Comment on lines -79 to +86
* 1 / 1488.
* 1 / 1488, so (for example) f1' = B / 2 = 744 is mapped to
*
* round(744 * 11275 / 2^24) = round(744 * (1 / 1488 - epsilon))
* = round(1 / 2 - epsilon') = 0,
*
* where epsilon = 1 / 1488 - 11275 / 2^24 and epsilon' = 744 * eps are both
* tiny but positive numbers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants