Skip to content

Conversation

@TechnoPorg
Copy link

This reworks the From<Vec<T>> conversions for SplitVecs with the Doubling growth strategy to not clone the input values, which should increase both performance and usability.

In practice, I'm not sure how effective this is. According the benchmark I've added here, performance stays roughly the same across the board; for two cases it speeds up, while for the other two, it slows down. I have limited experience with micro-optimization of this sort, so any input you have would be appreciated.

faster-doubling-from-vec

violin

main

violin

@orxfun
Copy link
Owner

orxfun commented Oct 13, 2025

Hi @TechnoPorg .

Thanks a lot for the PR! Regardless of the impact on performance, it is awesome that your PR relaxes the T: Clone requirement for the From implementation.

I have a couple of small suggestions. It would be great if you could take a look at them before merging.

1. Extending the test

It would be great to extend your test with some edge case numbers. This is the current test you've added.

#[test]
fn from_same_as_push() {
    for len in 0..135 {
        let vec: Vec<_> = (0..len).collect();
        let split_vec_from: SplitVec<_, Doubling> = vec.clone().into();
        let mut split_vec_manual = SplitVec::new();
        for item in vec {
            split_vec_manual.push(item);
        }
        assert_eq!(split_vec_from, split_vec_manual);
    }
}

It would be great if you could simply extend this test with different lengths. As you know, the lengths of the fragments are 4, 8, 16, etc. So testing lengths 11, 12, 13 would be nice. And also 0, 1 are special. Feel free to add other magic numbers :)

#[test_matrix([0, 1, 4, 5, 11, 12, 13, 135])]
fn from_same_as_push(len: usize) {
    for len in 0..len {
        let vec: Vec<_> = (0..len).map(|x| x.to_string()).collect();
        let split_vec_from: SplitVec<_, Doubling> = vec.clone().into();
        let mut split_vec_manual = SplitVec::new();
        for item in vec {
            split_vec_manual.push(item);
        }
        assert_eq!(split_vec_from, split_vec_manual);
    }
}

2. Benchmark

I wouldn't worry much if the performance did not change at all since removing T: Clone is already a very nice contribution. Nevertheless, it is curious to see the impact. I believe the problem you observe might be related to the benchmark set up.

Consider for instance the benchmarked function: doubling_from_std_vec:

fn doubling_from_std_vec<T: Clone, F: Fn(usize) -> T>(n: usize, value: F) -> SplitVec<T, Doubling> {
    // part A
    let mut std_vec = Vec::new();
    for i in 0..n {
        std_vec.push(value(i));
    }
    // part B
    SplitVec::from(std_vec)
}

Here, we only want to measure part B: SplitVec::from(std_vec). However, we first have to build the std_vec in part A. Note that part A might possibly take similar amount of time as part B, if not more. This must be adding a significant noise to the benchmark results.

I don't think we can completely avoid part A.

Maybe, we might at least mitigate the impact. For instance, how about the following:

fn doubling_from_std_vec<T: Clone>(vec: &Vec<T>) -> SplitVec<T, Doubling> {
    let std_vec = vec.clone();
    SplitVec::from(std_vec)
}

We could then build the vec for each treatment n only once, and pass to each of the benchmarked methods as a reference. Then, we could assume that vec.clone() call within benchmarked functions would be faster than building the vec. Hopefully, this could allow us to measure the from call more accurately.

@TechnoPorg
Copy link
Author

Thanks for the review! I've adjusted the benchmark to construct the Vec outside of the function being benchmarked, which has made the numbers more consistent.

I'm not entirely sure what is meant by # 1 - since the test is iterating over all vector lengths from 0 to 135, it already passes through 11, 12, 13, and more.

Updated benchmarks

This PR

from/doubling_from_std_vec/n=1024,elem-type=[u64;16]
time: [483.05 ns 486.22 ns 490.13 ns]
from/linear_from_std_vec/n=1024,elem-type=[u64;16]
time: [174.99 ns 176.53 ns 178.49 ns]
from/recursive_from_std_vec/n=1024,elem-type=[u64;16]
time: [172.85 ns 175.67 ns 178.62 ns]
from/doubling_from_std_vec/n=16384,elem-type=[u64;16]
time: [7.9150 µs 8.0019 µs 8.1066 µs]
from/linear_from_std_vec/n=16384,elem-type=[u64;16]
time: [3.7497 µs 3.8071 µs 3.8610 µs]
from/recursive_from_std_vec/n=16384,elem-type=[u64;16]
time: [3.4402 µs 3.4663 µs 3.4958 µs]
Benchmarking from/doubling_from_std_vec/n=262144,elem-type=[u64;16]: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.0s, enable flat sampling, or reduce sample count to 50.
from/doubling_from_std_vec/n=262144,elem-type=[u64;16]
time: [1.5563 ms 1.5668 ms 1.5788 ms]
from/linear_from_std_vec/n=262144,elem-type=[u64;16]
time: [68.116 µs 69.045 µs 70.008 µs]
from/recursive_from_std_vec/n=262144,elem-type=[u64;16]
time: [80.242 µs 80.716 µs 81.289 µs]
from/doubling_from_std_vec/n=4194304,elem-type=[u64;16]
time: [32.365 ms 32.612 ms 32.889 ms]
from/linear_from_std_vec/n=4194304,elem-type=[u64;16]
time: [12.756 ms 12.851 ms 12.959 ms]
from/recursive_from_std_vec/n=4194304,elem-type=[u64;16]
time: [12.797 ms 12.885 ms 12.985 ms]

Current main

from/doubling_from_std_vec/n=1024,elem-type=[u64;16]
time: [458.66 ns 461.45 ns 465.08 ns]
change: [-6.4681% -5.3075% -4.0357%] (p = 0.00 < 0.05)
Performance has improved.
from/linear_from_std_vec/n=1024,elem-type=[u64;16]
time: [168.20 ns 169.53 ns 171.21 ns]
change: [-6.4732% -4.9952% -3.5439%] (p = 0.00 < 0.05)
Performance has improved.
from/recursive_from_std_vec/n=1024,elem-type=[u64;16]
time: [165.68 ns 166.87 ns 168.42 ns]
change: [-4.4703% -2.9763% -1.4927%] (p = 0.00 < 0.05)
Performance has improved.
from/doubling_from_std_vec/n=16384,elem-type=[u64;16]
time: [7.7257 µs 7.7814 µs 7.8509 µs]
change: [-2.8366% -1.2537% +0.2545%] (p = 0.10 > 0.05)
No change in performance detected.
from/linear_from_std_vec/n=16384,elem-type=[u64;16]
time: [3.5791 µs 3.6570 µs 3.7304 µs]
change: [-10.011% -7.6871% -5.3975%] (p = 0.00 < 0.05)
Performance has improved.
from/recursive_from_std_vec/n=16384,elem-type=[u64;16]
time: [3.8731 µs 3.9218 µs 3.9629 µs]
change: [+10.252% +11.687% +13.147%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking from/doubling_from_std_vec/n=262144,elem-type=[u64;16]: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
from/doubling_from_std_vec/n=262144,elem-type=[u64;16]
time: [1.7538 ms 1.7758 ms 1.7964 ms]
change: [+12.947% +14.264% +15.553%] (p = 0.00 < 0.05)
Performance has regressed.
from/linear_from_std_vec/n=262144,elem-type=[u64;16]
time: [79.097 µs 79.254 µs 79.387 µs]
change: [+12.140% +13.499% +14.839%] (p = 0.00 < 0.05)
Performance has regressed.
from/recursive_from_std_vec/n=262144,elem-type=[u64;16]
time: [72.634 µs 73.788 µs 75.019 µs]
change: [-9.6948% -8.2484% -6.9055%] (p = 0.00 < 0.05)
Performance has improved.
from/doubling_from_std_vec/n=4194304,elem-type=[u64;16]
time: [35.695 ms 35.985 ms 36.306 ms]
change: [+9.0442% +10.342% +11.679%] (p = 0.00 < 0.05)
Performance has regressed.
from/linear_from_std_vec/n=4194304,elem-type=[u64;16]
time: [14.419 ms 14.588 ms 14.750 ms]
change: [+11.849% +13.512% +15.028%] (p = 0.00 < 0.05)
Performance has regressed.
from/recursive_from_std_vec/n=4194304,elem-type=[u64;16]
time: [14.660 ms 14.901 ms 15.134 ms]
change: [+13.649% +15.642% +17.787%] (p = 0.00 < 0.05)
Performance has regressed.

@TechnoPorg TechnoPorg force-pushed the faster-doubling-from-vec branch from 370fb44 to 94808d5 Compare October 14, 2025 11:26
@orxfun
Copy link
Owner

orxfun commented Oct 15, 2025

Thanks @TechnoPorg . No worries for my first point, all looks great. Will merge once the checks are complete.

@orxfun orxfun merged commit 68523fd into orxfun:main Oct 16, 2025
1 check passed
@TechnoPorg TechnoPorg deleted the faster-doubling-from-vec branch October 16, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants