Skip to content

Conversation

@quaternic
Copy link
Contributor

@quaternic quaternic commented Dec 4, 2025

Feature gate: #![feature(uint_gather_scatter_bits)]
Tracking issue: #149069
Accepted ACP: rust-lang/libs-team#695

Implements the methods using the parallel suffix strategy mentioned in the ACP discussion. The referenced source material provides C implementations, though this PR makes improvements over those, cutting the instruction count by a third:
https://rust.godbolt.org/z/rn5naYnK4 (this PR)
https://c.godbolt.org/z/WzYd5WbsY (Hacker's delight)

This was initially based on the code for gather_bits that @okaneco provided in rust-lang/libs-team#695 (comment) . I wanted to understand how it worked, and later on noticed some opportunities for improvement, which eventually led to this PR.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 4, 2025
@rust-log-analyzer

This comment has been minimized.

@quaternic quaternic force-pushed the gather-scatter-bits-opt branch from 9549004 to 2c752af Compare December 4, 2025 23:05
@rust-log-analyzer

This comment has been minimized.

@quaternic quaternic force-pushed the gather-scatter-bits-opt branch from 2c752af to ac7e6c6 Compare December 4, 2025 23:49
@quaternic
Copy link
Contributor Author

quaternic commented Dec 5, 2025

Do take this with a grain of salt, since I authored the benchmarks with this in mind. In particular, since the new implementation doesn't have any input-dependent control-flow, it is easily vectorized which all of these benchmarks allow for.

Benchmarked locally on an Intel Core i7 920 @ 2.67GHz (from ~2009)

old (ns/iter) new (ns/iter) old/new
num::int_bits::u8::constant::gather_bits 134 193 0.69
num::int_bits::u8::constant::scatter_bits 133 143 0.93
num::int_bits::u8::invariant::gather_bits 8723 193 45.11
num::int_bits::u8::invariant::scatter_bits 10827 187 57.85
num::int_bits::u8::variable::gather_bits 17938 734 24.43
num::int_bits::u8::variable::scatter_bits 19809 838 23.64
num::int_bits::u16::constant::gather_bits 278 290 0.96
num::int_bits::u16::constant::scatter_bits 279 198 1.41
num::int_bits::u16::invariant::gather_bits 9401 231 40.66
num::int_bits::u16::invariant::scatter_bits 9480 233 40.71
num::int_bits::u16::variable::gather_bits 16398 932 17.59
num::int_bits::u16::variable::scatter_bits 14793 1083 13.66
num::int_bits::u32::constant::gather_bits 528 373 1.42
num::int_bits::u32::constant::scatter_bits 520 301 1.73
num::int_bits::u32::invariant::gather_bits 7699 284 27.11
num::int_bits::u32::invariant::scatter_bits 6670 295 22.6
num::int_bits::u32::variable::gather_bits 9993 1394 7.17
num::int_bits::u32::variable::scatter_bits 9051 1620 5.59
num::int_bits::u64::constant::gather_bits 1008 387 2.6
num::int_bits::u64::constant::scatter_bits 1015 377 2.69
num::int_bits::u64::invariant::gather_bits 7892 347 22.78
num::int_bits::u64::invariant::scatter_bits 6730 362 18.58
num::int_bits::u64::variable::gather_bits 8930 1930 4.63
num::int_bits::u64::variable::scatter_bits 7981 2238 3.57
num::int_bits::u128::constant::gather_bits 15690 721 21.77
num::int_bits::u128::constant::scatter_bits 11374 655 17.35
num::int_bits::u128::invariant::gather_bits 16542 856 19.33
num::int_bits::u128::invariant::scatter_bits 13316 864 15.4
num::int_bits::u128::variable::gather_bits 16984 4403 3.86
num::int_bits::u128::variable::scatter_bits 13721 4674 2.94

@quaternic quaternic marked this pull request as ready for review December 5, 2025 11:08
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 5, 2025
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 5, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 5, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Copy link
Contributor

@okaneco okaneco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

I noticed spot-checking some random constant masks that this implementation and the HD implementation flip-flop on which has more instructions, but it's only a small difference count. On simpler masks, they seem to optimize similarly.

The dynamic mask output is a drastic improvement with this by a quarter/third reduction of instructions for gather and scatter.

alive2 showing that this implementation and the current implementation appear to be equivalent functions
gather - https://alive2.llvm.org/ce/z/wKxY2Z
scatter - https://alive2.llvm.org/ce/z/_98DbY
scratchpad for the LLVM IR output - https://rust.godbolt.org/z/brbePx44T

View changes since this review

//! ........ABCDEFGH
//! ```
macro_rules! uint_impl {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintainability/readability: There are quite a few single character variable names in these functions: m, n, k, j, q, x.

If possible, it might be good to give the variables/counters a slightly more descriptive name or leave a comment above them for their function if it doesn't reduce readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'm thinking at least

k -> stage
m -> mask

n is so heavily referenced in the comments that it probably needs to stay a single letter, but could be w for width.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants