Skip to content

Commit 326d731

Browse files
committed
[DRAFT] Better explanation for round() vs round-() in decompose
Only the AVX2 one is updated currently. The NEON one will also be updated if this looks good. Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
1 parent 3562064 commit 326d731

File tree

4 files changed

+32
-4
lines changed

4 files changed

+32
-4
lines changed

dev/x86_64/src/poly_decompose_32_avx2.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,17 @@ void mld_poly_decompose_32_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
7272
* _mm256_mulhi_epu16() below.
7373
*/
7474

75+
/* check-magic: 2046 == intdiv(4092, 2) */
7576
/*
7677
* Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact
7778
* for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲
78-
* 1 / 4092.
79+
* 1 / 4092, so (for example) f1' = B / 2 = 2046 is mapped to
80+
*
81+
* round(2046 * 1025 / 2^22) = round(2046 * (1 / 4092 - epsilon))
82+
* = round(1 / 2 - epsilon') = 0,
83+
*
84+
* where epsilon = 1 / 4092 - 1025 / 2^22 and epsilon' = 2046 * eps are both
85+
* tiny but positive numbers.
7986
*
8087
* round(f1' * 1025 / 2^22) is in turn computed in 2 steps as
8188
* round(floor(f1' * 1025 / 2^16) / 2^6). The mulhi computes f1'' =

dev/x86_64/src/poly_decompose_88_avx2.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,17 @@ void mld_poly_decompose_88_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
7373
* _mm256_mulhi_epu16() below.
7474
*/
7575

76+
/* check-magic: 744 == intdiv(1488, 2) */
7677
/*
7778
* Compute f1 = round-(f1' / B) ≈ round(f1' * 11275 / 2^24). This is exact
7879
* for 0 <= f1' < 2^16. Note that half is rounded down since 11275 / 2^24 ≲
79-
* 1 / 1488.
80+
* 1 / 1488, so (for example) f1' = B / 2 = 744 is mapped to
81+
*
82+
* round(744 * 11275 / 2^24) = round(744 * (1 / 1488 - epsilon))
83+
* = round(1 / 2 - epsilon') = 0,
84+
*
85+
* where epsilon = 1 / 1488 - 11275 / 2^24 and epsilon' = 744 * eps are both
86+
* tiny but positive numbers.
8087
*
8188
* round(f1' * 11275 / 2^24) is in turn computed in 2 steps as
8289
* round(floor(f1' * 11275 / 2^16) / 2^8). The mulhi computes f1'' =

mldsa/src/native/x86_64/src/poly_decompose_32_avx2.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,17 @@ void mld_poly_decompose_32_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
7272
* _mm256_mulhi_epu16() below.
7373
*/
7474

75+
/* check-magic: 2046 == intdiv(4092, 2) */
7576
/*
7677
* Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact
7778
* for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲
78-
* 1 / 4092.
79+
* 1 / 4092, so (for example) f1' = B / 2 = 2046 is mapped to
80+
*
81+
* round(2046 * 1025 / 2^22) = round(2046 * (1 / 4092 - epsilon))
82+
* = round(1 / 2 - epsilon') = 0,
83+
*
84+
* where epsilon = 1 / 4092 - 1025 / 2^22 and epsilon' = 2046 * eps are both
85+
* tiny but positive numbers.
7986
*
8087
* round(f1' * 1025 / 2^22) is in turn computed in 2 steps as
8188
* round(floor(f1' * 1025 / 2^16) / 2^6). The mulhi computes f1'' =

mldsa/src/native/x86_64/src/poly_decompose_88_avx2.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,17 @@ void mld_poly_decompose_88_avx2(__m256i *a1, __m256i *a0, const __m256i *a)
7373
* _mm256_mulhi_epu16() below.
7474
*/
7575

76+
/* check-magic: 744 == intdiv(1488, 2) */
7677
/*
7778
* Compute f1 = round-(f1' / B) ≈ round(f1' * 11275 / 2^24). This is exact
7879
* for 0 <= f1' < 2^16. Note that half is rounded down since 11275 / 2^24 ≲
79-
* 1 / 1488.
80+
* 1 / 1488, so (for example) f1' = B / 2 = 744 is mapped to
81+
*
82+
* round(744 * 11275 / 2^24) = round(744 * (1 / 1488 - epsilon))
83+
* = round(1 / 2 - epsilon') = 0,
84+
*
85+
* where epsilon = 1 / 1488 - 11275 / 2^24 and epsilon' = 744 * eps are both
86+
* tiny but positive numbers.
8087
*
8188
* round(f1' * 11275 / 2^24) is in turn computed in 2 steps as
8289
* round(floor(f1' * 11275 / 2^16) / 2^8). The mulhi computes f1'' =

0 commit comments

Comments
 (0)