-
Notifications
You must be signed in to change notification settings - Fork 482
TRD vDrift and ExB calibration updated #8668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Actually, since the workflows from calib-workflow.sh are attached to the (sync) processing workflow, running there aggregators (e.g. vdrift and meanvertex) makes sense only in the FST. On the EPN they should be substituted by the output proxies sending the data requested by the aggregators + separate workflow started on the aggregator node. |
|
Fine with me in general, I am also working on some automation to be able to run the aggregators locally, though still via 2 proxies, in order to run the full calib workflow in the FST. |
|
Ciao @davidrohr , I am also testing the aggregator with calibrations run locally. Just to not duplicate the efforts, you are just working on the FST framework or also integrating the calibrations? I am working on the latter. Chiara |
|
Just working on the framework, not on any individual aggregators |
|
Ok, I need to fix something, but then I will commit the first version of the scripts (hopefully tomorrow), so that you can take a look if it fits. |
davidrohr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR the full system test on the EPN with 8 GPUs produces backpressure, and I see that the SHM runs full. (I retried the exact same software without this PR and then it works.)
To reproduce, on the EPN:
GEN_TOPO_WORKDIR=/home/drohr/tmp4/ QC_REDIRECT_MERGER_TO_LOCALHOST=1 WORKFLOW_PARAMETERS=QC,CALIB,EVENT_DISPLAY WORKFLOW_DETECTORS_QC=FT0,FV0,FDD,ZDC,TOF,ITS,MFT,MID,MCH,EMC,PHS,CPV CONFIG_EXTRA_PROCESS_o2_gpu_reco_workflow="GPU_global.benchmarkMemoryRegistration=1;" TFDELAY=5 NTIMEFRAMES=100 $O2_ROOT/prodtests/full-system-test/start_tmux.sh dd
To check the SHM:
fairmq-shmmonitor -v -i
|
Thanks a lot for checking @davidrohr I am having a look |
|
So the backpressure was traced back to a bug in the workflow. The output from the TrackBasedCalib was sent only for every 200th TF, but the output was not declared sporadic. This is now fixed and with the current status this PR does not cause backpressure anymore on the FST on the EPNs. |
yes, the Sporadic is not working as expected, @ktf is checking |
|
Hi @davidrohr is it OK for you to merge this as it is now? As mentioned above there is no more backpressure anymore now that the output is sent for every TF. |
| if [[ $BEAMTYPE != "cosmic" ]]; then | ||
| has_detector_calib TPC && has_detectors TPC ITS TRD TOF && add_W o2-tpc-scdcalib-interpolation-workflow "$DISABLE_ROOT_OUTPUT --disable-root-input --pipeline $(get_N tpc-track-interpolation TPC REST)" "$ITSMFT_FILES" | ||
| has_detector_calib ITS && has_detectors ITS && has_detectors_reco ITS && has_detector_matching PRIMVTX && [[ ! -z "$VERTEXING_SOURCES" ]] && add_W o2-calibration-mean-vertex-calibration-workflow | ||
| has_detector_calib TRD && has_detector ITS TPC TRD && add_W o2-calibration-trd-vdrift-exb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it should be has_detectors, plural, no? Anyway, see my other comment #8668 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Thanks! I see you have already corrected that in AliceO2Group/O2DPG#375
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did, but of course the other PR is a complete change with respect to yours, and your calibration won't run for now in the FST. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me in principle yes. If I understand correctly with your changes none of the aggregators would be tested in the FST per default. I am wondering if we should not test them in the CI or then at least enable them when we install a new O2 version at P2 and let the FST run for a longer time to check the stability. But here I let @davidrohr comment (probably better directly in #8736)
|
Hello, Please see: Chiara |
To send CCDB object without relying on the CCDB manager.
Not sure if we want to merge the second commit. This would enable the calibration per default in the FST.