Skip to content

Performance issue #6

@bobbymcr

Description

@bobbymcr

I was interested in percentile estimation and learned about t-digest while researching the topic. I was happy to find this project since I work primarily in .NET platforms. Thanks for building this!

According to the original Java t-digest project, "the particularly interesting characteristics of the t-digest are that it [...] is very fast (~ 140 ns per add)". I wanted to see if the .NET version lived up to this measurement, so I built a simple set of benchmarks and got the following results:

Method N Asc Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
BuildSorted 1000 False 2.076 ms 0.0346 ms 0.0307 ms 136.7188 66.4063 19.5313 847.66 KB
BuildSorted 1000 True 17.267 ms 0.0808 ms 0.0675 ms 125.0000 - - 847.71 KB
BuildSorted 10000 False 134.297 ms 0.9801 ms 0.8185 ms 3500.0000 1750.0000 500.0000 23075.98 KB
BuildSorted 10000 True 391.343 ms 2.2634 ms 1.8901 ms 3000.0000 1000.0000 - 23062.74 KB
BuildSorted 100000 False 1,504.426 ms 12.3338 ms 10.2992 ms 40000.0000 20000.0000 6000.0000 248582.73 KB
BuildSorted 100000 True 4,663.984 ms 36.5801 ms 34.2170 ms 40000.0000 20000.0000 5000.0000 250178.06 KB

These numbers show anywhere from 2000 to over 46000 ns per add which is not very competitive with the Java version. :-( Running a profiling session, I see the hottest function is ComputeCentroidQuantile which does a linear search over the centroid list to calculate the cumulative sum up to the given mean. This function is called up to two times for every add operation.

I haven't studied the Java code extensively, but it seems that version avoids this expense by precomputing aggregated counts within an AVL tree structure, implying more of an O(log(N)) amortized cost as opposed to O(N). Would it be worth doing a more direct port of the Java code to see if the performance improves?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions