TRD vDrift and ExB calibration updated #8668

martenole · 2022-04-27T18:14:45Z

To send CCDB object without relying on the CCDB manager.

Not sure if we want to merge the second commit. This would enable the calibration per default in the FST.

shahor02 · 2022-04-27T20:48:52Z

Not sure if we want to merge the second commit. This would enable the calibration per default in the FST.

Actually, since the workflows from calib-workflow.sh are attached to the (sync) processing workflow, running there aggregators (e.g. vdrift and meanvertex) makes sense only in the FST. On the EPN they should be substituted by the output proxies sending the data requested by the aggregators + separate workflow started on the aggregator node.
Since at the moment we are not running online calibrations on the EPNs, I think the current code is fine, but we should eventually substitute these aggregators by the corresponding output proxy and move the aggregators to separate script, like in O2DPG/DATA/production/calib. Then they can also be started in the FST as separate workflows on the same PC.
Don't know what plans @davidrohr has on this.

davidrohr · 2022-05-01T10:21:38Z

Fine with me in general, I am also working on some automation to be able to run the aggregators locally, though still via 2 proxies, in order to run the full calib workflow in the FST.
Just, before merging I want to test the performance and stability in 8 GPU FST on the EPNs, will do that next week.

chiarazampolli · 2022-05-01T14:14:49Z

Ciao @davidrohr ,

I am also testing the aggregator with calibrations run locally. Just to not duplicate the efforts, you are just working on the FST framework or also integrating the calibrations? I am working on the latter.

Chiara

davidrohr · 2022-05-01T14:17:52Z

Just working on the framework, not on any individual aggregators

chiarazampolli · 2022-05-01T16:12:13Z

Ok, I need to fix something, but then I will commit the first version of the scripts (hopefully tomorrow), so that you can take a look if it fits.
Thanks!!!

davidrohr

With this PR the full system test on the EPN with 8 GPUs produces backpressure, and I see that the SHM runs full. (I retried the exact same software without this PR and then it works.)

To reproduce, on the EPN:

GEN_TOPO_WORKDIR=/home/drohr/tmp4/ QC_REDIRECT_MERGER_TO_LOCALHOST=1 WORKFLOW_PARAMETERS=QC,CALIB,EVENT_DISPLAY WORKFLOW_DETECTORS_QC=FT0,FV0,FDD,ZDC,TOF,ITS,MFT,MID,MCH,EMC,PHS,CPV CONFIG_EXTRA_PROCESS_o2_gpu_reco_workflow="GPU_global.benchmarkMemoryRegistration=1;" TFDELAY=5 NTIMEFRAMES=100 $O2_ROOT/prodtests/full-system-test/start_tmux.sh dd

To check the SHM:

fairmq-shmmonitor -v -i

martenole · 2022-05-04T07:54:26Z

Thanks a lot for checking @davidrohr I am having a look

martenole · 2022-05-05T14:59:01Z

So the backpressure was traced back to a bug in the workflow. The output from the TrackBasedCalib was sent only for every 200th TF, but the output was not declared sporadic. This is now fixed and with the current status this PR does not cause backpressure anymore on the FST on the EPNs.
Currently I send the output for every TF. I tried also to declare it Lifetime::Sporadic but this then again lead to backpressure.

shahor02 · 2022-05-05T15:31:47Z

So the backpressure was traced back to a bug in the workflow. The output from the TrackBasedCalib was sent only for every 200th TF, but the output was not declared sporadic. This is now fixed and with the current status this PR does not cause backpressure anymore on the FST on the EPNs. Currently I send the output for every TF. I tried also to declare it Lifetime::Sporadic but this then again lead to backpressure.

yes, the Sporadic is not working as expected, @ktf is checking

martenole · 2022-05-06T07:35:19Z

Hi @davidrohr is it OK for you to merge this as it is now? As mentioned above there is no more backpressure anymore now that the output is sent for every TF.

chiarazampolli · 2022-05-06T13:19:10Z

prodtests/full-system-test/calib-workflow.sh

 if [[ $BEAMTYPE != "cosmic" ]]; then
  has_detector_calib TPC && has_detectors TPC ITS TRD TOF && add_W o2-tpc-scdcalib-interpolation-workflow "$DISABLE_ROOT_OUTPUT --disable-root-input --pipeline $(get_N tpc-track-interpolation TPC REST)" "$ITSMFT_FILES"
  has_detector_calib ITS && has_detectors ITS && has_detectors_reco ITS && has_detector_matching PRIMVTX && [[ ! -z "$VERTEXING_SOURCES" ]] && add_W o2-calibration-mean-vertex-calibration-workflow
+  has_detector_calib TRD && has_detector ITS TPC TRD && add_W o2-calibration-trd-vdrift-exb


Here it should be has_detectors, plural, no? Anyway, see my other comment #8668 (comment)

Yes, you are right. Thanks! I see you have already corrected that in AliceO2Group/O2DPG#375

Yes, I did, but of course the other PR is a complete change with respect to yours, and your calibration won't run for now in the FST. Is it ok?

For me in principle yes. If I understand correctly with your changes none of the aggregators would be tested in the FST per default. I am wondering if we should not test them in the CI or then at least enable them when we install a new O2 version at P2 and let the FST run for a longer time to check the stability. But here I let @davidrohr comment (probably better directly in #8736)

chiarazampolli · 2022-05-06T13:19:20Z

Hello,

Please see:

AliceO2Group/O2DPG#375
#8736

Chiara

martenole requested review from a team, bazinski, jolopezl and tdietel as code owners April 27, 2022 18:14

davidrohr requested changes May 4, 2022

View reviewed changes

martenole added 3 commits May 5, 2022 13:52

TRD vDrift calib uses proper CCDB output mechanism

8568665

Enable TRD vDrift calib for data taking with beam

3b0bccb

TrackBasedCalib sends output for every TF

bf8606d

martenole force-pushed the trdvdrift branch from 3158006 to bf8606d Compare May 5, 2022 14:56

davidrohr merged commit b0dfbb8 into AliceO2Group:dev May 6, 2022

chiarazampolli reviewed May 6, 2022

View reviewed changes

chiarazampolli mentioned this pull request May 6, 2022

Calib workflow #8736

Merged

TRD vDrift and ExB calibration updated #8668

TRD vDrift and ExB calibration updated #8668

Uh oh!

Conversation

martenole commented Apr 27, 2022

Uh oh!

shahor02 commented Apr 27, 2022

Uh oh!

davidrohr commented May 1, 2022

Uh oh!

chiarazampolli commented May 1, 2022

Uh oh!

davidrohr commented May 1, 2022

Uh oh!

chiarazampolli commented May 1, 2022

Uh oh!

davidrohr left a comment

Choose a reason for hiding this comment

Uh oh!

martenole commented May 4, 2022

Uh oh!

martenole commented May 5, 2022

Uh oh!

shahor02 commented May 5, 2022

Uh oh!

martenole commented May 6, 2022

Uh oh!

chiarazampolli May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martenole May 6, 2022

Choose a reason for hiding this comment

Uh oh!

chiarazampolli May 6, 2022

Choose a reason for hiding this comment

Uh oh!

martenole May 6, 2022

Choose a reason for hiding this comment

Uh oh!

chiarazampolli commented May 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

chiarazampolli May 6, 2022 •

edited

Loading