-
Notifications
You must be signed in to change notification settings - Fork 26
Fix some minor details in comments for AVX2 decompose #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
jammychiou1
commented
Dec 1, 2025
- The floor() in floor((f + 127) >> 7) was somewhat unecessary as the usual semantic for the right-shift operator (>>) has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency.
- The bound of f1'' is correct but the proof was misleading. The new proof should be clearer.
0b47c08 to
9931203
Compare
|
I did consider addressing #654 also in this PR, but I haven't come up with a great idea for that yet... |
- The floor() in floor((f + 127) >> 7) was somewhat unecessary as the usual semantic for the right-shift operator (>>) has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency. - The bound of f1'' is correct but the proof was misleading. The new proof should be clearer. Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Only the AVX2 one is updated currently. The NEON one will also be updated if this looks good. Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
0be3a84 to
326d731
Compare
| * Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact | ||
| * for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲ | ||
| * 1 / 4092. | ||
| * 1 / 4092, so (for example) f1' = B / 2 = 2046 is mapped to | ||
| * | ||
| * round(2046 * 1025 / 2^22) = round(2046 * (1 / 4092 - epsilon)) | ||
| * = round(1 / 2 - epsilon') = 0, | ||
| * | ||
| * where epsilon = 1 / 4092 - 1025 / 2^22 and epsilon' = 2046 * eps are both | ||
| * tiny but positive numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-becker May I have your review on this explanation? If this looks good, I plan to also apply this to the aarch64 implementation and resolve #654. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jammychiou1 Apologies for the slow reply.
I think if we want to fix #654 we should provide a general explanation.
Here's an attempt:
The approximation error for `f1' / B ≈ f1' * 1025 / 2^22` is `f1' * (1025/2^22 - 1/B)`.
For `eps := 1025/2^22 - 1/B` we have `eps = 1/4290772992 ~ 2^{-31.99} < 2^{-31}`.
Hence `|f1'| * eps < 2^{-15}`. Given `B = 4092 ~ 2^12`, we thus have `|f1'| * eps < 1/B`.
On the other hand, `1/B` is the spacing between the integral multiples of `1/B`, which
includes all rounding boundaries `n + 1/2` (since `B` is even). Hence, if `f1' / B` is not
of the form `n + 1/2`, then moving from `f1' / B` to `f1' * 1025 / 2^22` does not cross
a rounding boundary, and hence `round(f1' / B) = round(f1' * 1025 / 2^22)`, and it doesn't
matter on either side which version of rounding one uses.
If `f1' / B` _is_ of the form `n + 1/2`, then `f1' * 1025 / 2^22` is slightly below it
(and _not_ a multiple of `n + 1/2`), hence `round-(f1' / B) = round(f1' * 1025 / 2^22)`;
where the round-down on the LHS is essential, and on the RHS the type of rounding
again does not matter.
| * 1 / 1488. | ||
| * 1 / 1488, so (for example) f1' = B / 2 = 744 is mapped to | ||
| * | ||
| * round(744 * 11275 / 2^24) = round(744 * (1 / 1488 - epsilon)) | ||
| * = round(1 / 2 - epsilon') = 0, | ||
| * | ||
| * where epsilon = 1 / 1488 - 11275 / 2^24 and epsilon' = 744 * eps are both | ||
| * tiny but positive numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here