Skip to content

Conversation

@ktf
Copy link
Member

@ktf ktf commented Jun 23, 2025

No description provided.

@ktf ktf requested a review from a team as a code owner June 23, 2025 12:13
@github-actions
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@ktf
Copy link
Member Author

ktf commented Jun 23, 2025

@davidrohr @ehellbar this is what we discuss last week in the meeting.

@ktf
Copy link
Member Author

ktf commented Jun 24, 2025

@ehellbar this seems to pass the fullCI. How do we validate this further? Shall I just merge it and we try on staging?

@ehellbar
Copy link
Collaborator

@ktf I think I can test it with the local reproducer, will do it later this morning

@ehellbar
Copy link
Collaborator

ehellbar commented Jun 24, 2025

@ktf locally with only sporadic data it works. I tested with

o2-testworkflows-simple-source --delay 1000 --dataspec "src0:SRC/DATA/0" --data-processing-timeout 20 --exit-transition-timeout 30 | o2-testworkflows-simple-processor --data-processing-timeout 20 --exit-transition-timeout 30 --in-dataspec "src0:SRC/DATA/0" --out-dataspec "PROC0:CLB/DATA0/0?lifetime=sporadic;PROC1:CLB/DATA1/0?lifetime=sporadic" --processing-delay 1000 | o2-testworkflows-simple-processor --name populator --data-processing-timeout 10 --exit-transition-timeout 30 --in-dataspec "PROC0:CLB/DATA0/0?lifetime=sporadic;PROC1:CLB/DATA1/0?lifetime=sporadic" -b

setting a smaller data processing timeout for the last device. The sporadic data is processed by the last device after the first timeout while it was dropped without this commit.

@ehellbar
Copy link
Collaborator

ehellbar commented Jun 24, 2025

I tested another scenario, where we have mixed timeframe and sporadic data and a shorter data processing timeout for the last device. Then both data are dropped by the second device after the data processing timeout.

o2-testworkflows-simple-source --delay 1000 --dataspec "src0:SRC/DATA/0" --data-processing-timeout 20 --exit-transition-timeout 30 | o2-testworkflows-simple-processor --data-processing-timeout 20 --exit-transition-timeout 30 --in-dataspec "src0:SRC/DATA/0" --out-dataspec "PROC0:CLB/DATA0/0;PROC1:CLB/DATA1/0?lifetime=sporadic" --processing-delay 1000 | o2-testworkflows-simple-processor --name populator --data-processing-timeout 10 --exit-transition-timeout 30 --in-dataspec "PROC0:CLB/DATA0/0;PROC1:CLB/DATA1/0?lifetime=sporadic" -b
[73653:test-processor]: [11:57:32][INFO] Received 1 messages. Converting.
[73653:test-processor]: [11:57:33][INFO] Creating PROC0.
[73653:test-processor]: [11:57:33][INFO] Creating PROC1.
[73654:populator]: [11:57:33][INFO] Received 2 messages. Converting.
[73653:test-processor]: [11:57:33][INFO] Received 1 messages. Converting.
[73654:populator]: [11:57:34][INFO] id0000600002036ee0:callback        *> Grace period for data processing expired. Only calibrations from this point onwards.
[73653:test-processor]: [11:57:34][INFO] Creating PROC0.
[73653:test-processor]: [11:57:34][INFO] Creating PROC1.
[73653:test-processor]: [11:57:34][INFO] Received 1 messages. Converting.
[73654:populator]: [11:57:34][INFO] id000000011b605850:calibration     *> Dropping incoming 1 messages because they are data processing.
[73654:populator]: [11:57:34][ERROR] Dropping incomplete <matcher query: (and origin:CLB (and description:DATA1 (and subSpec:0 (just startTime:$0 ))))> Lifetime::qos data in slot 0 with timestamp 14 < 15 as it can never be completed.
[73654:populator]: [11:57:34][ERROR] Missing <matcher query: (and origin:CLB (and description:DATA0 (and subSpec:0 (just startTime:$0 ))))> (lifetime:timeframe) while dropping incomplete data in slot 0 with timestamp 14 < 15.

For the ccdb-populator (only sporadic input) this should be fine, but as far as I understand, the sporadic data should always be kept, right?

@ehellbar
Copy link
Collaborator

ehellbar commented Jun 24, 2025

and a third scenario: mixed timeframe and sporadic data, same timeouts for the processing devices:

o2-testworkflows-simple-source --delay 1000 --dataspec "src0:SRC/DATA/0" --data-processing-timeout 20 --exit-transition-timeout 30 | o2-testworkflows-simple-processor --data-processing-timeout 10 --exit-transition-timeout 30 --in-dataspec "src0:SRC/DATA/0" --out-dataspec "PROC0:CLB/DATA0/0;PROC1:CLB/DATA1/0?lifetime=sporadic" --processing-delay 1000 | o2-testworkflows-simple-processor --name populator --data-processing-timeout 10 --exit-transition-timeout 30 --in-dataspec "PROC0:CLB/DATA0/0;PROC1:CLB/DATA1/0?lifetime=sporadic" -b
[74532:test-processor]: [12:35:29][INFO] Received 1 messages. Converting.
[74532:test-processor]: [12:35:30][INFO] Creating PROC0.
[74532:test-processor]: [12:35:30][INFO] Creating PROC1.
[74532:test-processor]: [12:35:30][INFO] Received 1 messages. Converting.
[74533:populator]: [12:35:30][INFO] Received 2 messages. Converting.
[74533:populator]: [12:35:31][INFO] id0000600000cc3f20:callback        *> Grace period for data processing expired. Only calibrations from this point onwards.
[74532:test-processor]: [12:35:31][INFO] Creating PROC0.
[74532:test-processor]: [12:35:31][INFO] Creating PROC1.
[74532:test-processor]: [12:35:31][INFO] id0000600000af90e0:callback        *> Grace period for data processing expired. Only calibrations from this point onwards.
[74533:populator]: [12:35:31][INFO] id000000012380be80:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:31][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74533:populator]: [12:35:31][ERROR] Dropping incomplete <matcher query: (and origin:CLB (and description:DATA1 (and subSpec:0 (just startTime:$0 ))))> Lifetime::qos data in slot 0 with timestamp 12 < 13 as it can never be completed.
[74533:populator]: [12:35:31][ERROR] Missing <matcher query: (and origin:CLB (and description:DATA0 (and subSpec:0 (just startTime:$0 ))))> (lifetime:timeframe) while dropping incomplete data in slot 0 with timestamp 12 < 13.
[74532:test-processor]: [12:35:32][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:33][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:34][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:35][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:36][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:37][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:38][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:39][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.
[74532:test-processor]: [12:35:40][INFO] id000000011f6057d0:calibration     *> Dropping incoming 1 messages because they are data processing.

so it looks like once the first timer on both devices expires, we only drop the timeframe data. But still, there is this transition period where we have the short latency between the timers on different devices and drop also the sporadic data. Can we do anything against that?

@ktf
Copy link
Member Author

ktf commented Jun 24, 2025

@ehellbar I guess the best solution would be to propagate some sort of "end of data processing" similar to the "end of stream". That's not so trivial, though. I would suggest we start with this.

@ktf ktf merged commit 2ca4db7 into AliceO2Group:dev Jun 24, 2025
12 checks passed
mhemmer-cern pushed a commit to mhemmer-cern/AliceO2 that referenced this pull request Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants