-
Notifications
You must be signed in to change notification settings - Fork 483
NN clusterizer: Fixing memory access faults #14657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
|
Error while checking build/O2/fullCI_slc9 for 4998576 at 2025-09-05 21:33: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 5368d52 at 2025-09-05 23:52: Full log here. |
…ins out-of-bounds accesses and memory faults
Please consider the following formatting changes to AliceO2Group#14657
|
Error while checking build/O2/fullCI_slc9 for 0ed462e at 2025-09-07 01:03: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 902ddc6 at 2025-09-07 11:14: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 79cc38e at 2025-09-07 12:07: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for b50afbb at 2025-09-07 23:08: Full log here. |
|
@davidrohr : ready once the CI is green. |
| } | ||
|
|
||
| mRec->runParallelOuterLoop(doGPU, numLanes, [&](uint32_t lane) { | ||
| for (int32_t lane = 0; lane < numLanes; lane++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about this change here. When running with GPU, it should not make any difference. But when running on CPU backend, the new version would serialize while the old version would run in parallel.
Is this change intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this initialization step is cheap I consider that it makes no difference. However, I want to make sure that all network inits are done sequentially, just in case they make some mem-allocs which (might or might not) conflict. I saw some crashes when running online and this would remove a potential failure point.
|
Error while checking build/O2/fullCI_slc9 for a545f08 at 2025-09-08 23:03: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 7c47304 at 2025-09-09 14:01: Full log here. |
|
Build errors come from: [4893/5777] Building HIP object Detectors/ITSMFT/ITS/tracking/GPU/hip/CMakeFiles/O2lib-ITStrackingHIP.dir/TimeFrameGPU.hip.o Similar for MacOS (although there its the CCDBFetcher that failes) |
|
@ktf @singiamtel : The FullCI fails here with: So it seems the CodeChecker is broken? |
No description provided.