DPL Analysis: use offset cache for sorted grouping #14571

aalkin · 2025-08-11T11:32:41Z

Original algorithm was performing extremely poorly for grouping by very large (in terms of number of rows) tables, typical for the derived data, due to being O(N^2). Precalculating the offset cache at the start solves this issue, dramatically improving the performance for the extreme cases.

@ddobrigk

github-actions · 2025-08-11T11:32:53Z

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

ktf · 2025-08-14T12:46:11Z

I have tested this with Correlations on hyperloop (which I understand uses grouping) and it seems to. I do not see any particular change in performance though.

@ddobrigk Could you comment on wether your tests improved? I will then merge.

ktf · 2025-08-14T12:46:35Z

@ddobrigk you can use my tag on hyperloop (eulisse-local) to have a build with the PR included.

ddobrigk · 2025-08-14T13:06:06Z

Hi @ktf yes good point. Let me test right away, will write once I have results !

aalkin · 2025-08-14T13:10:10Z

To clarify: the performance in the usual cases - grouping of tracks over collisions in normal AODs - should be unchanged. The performance impact will be quite noticeable in derived data that packs a lot of entries in the tables used for grouping.

ddobrigk · 2025-08-14T13:11:50Z

To clarify: the performance in the usual cases - grouping of tracks over collisions in normal AODs - should be unchanged. The performance impact will be quite noticeable in derived data that packs a lot of entries in the tables used for grouping.

Yes, sure. I will run the same synthetic tests in which I vary the number of elements to group from an average of 1e-2 to 100 elements per "collision", so that we can directly gauge. The main problem is I screwed something up and am now stuck recompiling (sorry). More soon ...

ddobrigk · 2025-08-14T21:01:25Z

@aalkin @ktf I finished running a first set of benchmarks with a 'synthetic' AO2D scenario in which I have multiple test sizes for elements being grouped to collisions, from an average of 1e-2 elements per collision to 10^4 elements per collision.

This PR helps a lot already: see the benchmark result below, with the improved version gaining, say, up to an order of magnitude in the (very common!) range around 1-10 elements (e.g. tracks, V0s, etc) grouped per grouping entity (e.g. collision): it is much improved! :-) For reference, the 'custom' grouping option corresponds to this code (very simple, very naive, and capable of dealing with the unsorted case). If you like, I can also explore other scenarios (fixed DF size at X megabytes, etc) for testing: just let me know :-)

add offset cache

bd0c539

aalkin requested a review from a team as a code owner August 11, 2025 11:32

alibuild mentioned this pull request Aug 11, 2025

Please consider the following formatting changes to #14571 aalkin/AliceO2#105

Merged

fix failing tests

0315fa1

aalkin force-pushed the improve-grouping-performance branch from 0d219a5 to 0315fa1 Compare August 12, 2025 06:26

Please consider the following formatting changes

28e0239

alibuild mentioned this pull request Aug 12, 2025

Please consider the following formatting changes to #14571 aalkin/AliceO2#106

Merged

aalkin and others added 4 commits August 12, 2025 08:28

Merge pull request #106 from alibuild/alibot-cleanup-14571

f0a471b

fixup! fix failing tests

7bfaf51

use span instead of a pointer

93239b6

Please consider the following formatting changes

1a4b80f

alibuild mentioned this pull request Aug 13, 2025

Please consider the following formatting changes to #14571 aalkin/AliceO2#107

Merged

Merge pull request #107 from alibuild/alibot-cleanup-14571

4118b74

ktf enabled auto-merge (squash) August 14, 2025 21:18

ktf disabled auto-merge August 14, 2025 21:18

ktf enabled auto-merge (squash) August 14, 2025 21:18

ktf merged commit 65275d9 into AliceO2Group:dev Aug 14, 2025
11 checks passed

mhemmer-cern pushed a commit to mhemmer-cern/AliceO2 that referenced this pull request Sep 9, 2025

DPL Analysis: use offset cache for sorted grouping (AliceO2Group#14571)

d3e1690

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DPL Analysis: use offset cache for sorted grouping #14571

DPL Analysis: use offset cache for sorted grouping #14571

Uh oh!

aalkin commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

ktf commented Aug 14, 2025

Uh oh!

ktf commented Aug 14, 2025

Uh oh!

ddobrigk commented Aug 14, 2025

Uh oh!

aalkin commented Aug 14, 2025

Uh oh!

ddobrigk commented Aug 14, 2025

Uh oh!

ddobrigk commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

DPL Analysis: use offset cache for sorted grouping #14571

DPL Analysis: use offset cache for sorted grouping #14571

Uh oh!

Conversation

aalkin commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

ktf commented Aug 14, 2025

Uh oh!

ktf commented Aug 14, 2025

Uh oh!

ddobrigk commented Aug 14, 2025

Uh oh!

aalkin commented Aug 14, 2025

Uh oh!

ddobrigk commented Aug 14, 2025

Uh oh!

ddobrigk commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

ddobrigk commented Aug 14, 2025 •

edited

Loading