-
Notifications
You must be signed in to change notification settings - Fork 483
TPC QC: Add GPUErrorQA task #14397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPC QC: Add GPUErrorQA task #14397
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
wiechula
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ariedel-cern . Please check for compiler warnings and fix them.
Detectors/TPC/qc/src/GPUErrorQA.cxx
Outdated
| auto f = std::unique_ptr<TFile>(TFile::Open(filename.data(), "recreate")); | ||
| TObjArray arr; | ||
| arr.SetName("GPUErrorQA_Hists"); | ||
| for (const auto& [name, hist] : mMapHist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid warning.
| for (const auto& [name, hist] : mMapHist) { | |
| for ([[maybe_unused]] const auto& [name, hist] : mMapHist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Looking at other PRs, the full CI seems to be broken at the moment. Testing locally, I see no compiler warnings.
|
Error while checking build/O2/fullCI_slc9 for 41445c2 at 2025-06-12 14:55: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 46787b5 at 2025-06-15 21:24: Full log here. |
Detectors/TPC/qc/src/GPUErrorQA.cxx
Outdated
| // get gpu error names | ||
| // copied from GPUErrors.h | ||
| static std::unordered_map<uint32_t, const char*> errorNames = { | ||
| #define GPUCA_ERROR_CODE(num, name, ...) {num, GPUCA_M_STR(name)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just include GPUErrors.h instead of copy & pasting the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi David, the errorNames were defined in GPUError.cxx, not in the header. I was not sure if this was done on purpose, so that's why I copied the code.
I moved the definition now to the header, so I can include it in the QA task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was confused due to your comment // copied from GPUErrors.h. Indeed, the definition was in the CXX, and I'd prefer it to stay in the CXX. No need to full the header file with the strings.
Could you put a
#ifndef GPUCA_GPUCODE_DEVICE
static std::unordered_map<uint32_t, const char*> errorNames
#endif
in GPUErrors.h, but leave the actual definition in the cxx.
And please also protect the <unordered_map> include in the header with the same ifdef.
Detectors/TPC/qc/src/GPUErrorQA.cxx
Outdated
| // for convienence, label each bin with the error name | ||
| for (size_t bin = 1; bin < mMapHist["ErrorCounter"]->GetNbinsX(); bin++) { | ||
| auto const& it = o2::gpu::errorNames.find(bin); | ||
| mMapHist["ErrorCounter"]->GetXaxis()->SetBinLabel(bin, it->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case we will have a gap in the error numbers in the future, I would propose to check if it is a valid error Name.
|
Error while checking build/O2/fullCI_slc9 for 7b5264a at 2025-06-17 10:44: Full log here. |
davidrohr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new approach looks good in general, but see the 2 comments below:
GPU/GPUTracking/Global/GPUErrors.cxx
Outdated
| memset(mErrors, 0, GPUCA_MAX_ERRORS * sizeof(*mErrors)); | ||
| } | ||
|
|
||
| static std::unordered_map<uint32_t, const char*> errorNames = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this as static variable inside getErrorNames, so that it is not a global symbol?
Alternatively, please put it in o2::gpu::internal namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| #include "GPUCommonDef.h" | ||
| #include "GPUDefMacros.h" | ||
| #include <unordered_map> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, you need to protect unordered_map with #ifndeg GPUCA_GPUCODE_DEVICE.
We do not include any std headers in GPU kernel code.
And I think you don't need GPUDefMacros.h, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also fixed up the includes.
|
Error while checking build/O2/fullCI_slc9 for 8f676bc at 2025-06-17 18:10: Full log here. |
|
Hi @davidrohr , thanks a lot for all the feedback. Is this PR now OK to be merged? I think the 2 failed checks are due to some issue with O2Physics. |
|
Hi @ariedel-cern : The FullCI error is genuine, you also have to #ifdef-protect |
Following up on #13964