-
Notifications
You must be signed in to change notification settings - Fork 483
GPU Framework: fixed mismatch between CUDA and HIP launch bounds definition #14632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
davidrohr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I have just one minor comment.
However, as discussed, I would not merge this now, since it can affect the performance. We should merge it once have update performance defaults for MI50 and MI100.
Meanwhile, I'd switch it to a draft PR, so that we do not waste CI resources.
| if (par.par_LB_maxThreads[i] > 0) { \ | ||
| o << "#define GPUCA_LB_" GPUCA_M_STR(name) " " << par.par_LB_maxThreads[i]; \ | ||
| if (par.par_LB_minBlocks[i] > 0) { \ | ||
| o << ", " << GPUCA_CEIL_INT_DIV(par.par_LB_maxThreads[i] * par.par_LB_minBlocks[i], (minBlockFactor ? minBlockFactor : par.par_LB_maxThreads[i])); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this as a function to GPUCommonMath.h? I'd prefer to have proper functions instead of macros in C++ code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the function to GPUCommonMath.h makes ROOT complain. Basically it cannot expand GPUd() macros contained in GPUCommonMath.h when using
bash -c "echo -e '#define GPUCA_GPUTYPE_${GPU_ARCH}\\n#define PARAMETER_FILE \"GPUDefParametersDefaults.h\"\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/../Common\");\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/Definitions\");\\ngInterpreter->AddIncludePath(\"${ON_THE_FLY_DIR}\");\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C(\"${PARAMFILE}\")\\n.q\\n'" | root -l -b > /dev/null
For generating the .par files. Example:
/home/gcimador/alice/O2/GPU/GPUTracking/../Common/GPUCommonMath.h:51:3: error: a type specifier is required for all declarations
GPUd() static float2 MakeFloat2(float x, float y); // TODO: Find better appraoch that is constexpr
^
Do you know if there is a quick fix? I am afraid ROOT does not expand correctly the macros here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then just ignore my comment...
|
This PR did not have any update in the last 30 days. Is it still needed? Unless further action in will be closed in 5 days. |
Handled mismatch between CUDA and HIP __launch_bounds__ definitions. Performances remain the same. Added GPUCA_CEIL_INT_DIV macro for integer division with ceiling; this ensures that when computing the number of warps (e.g., 1.4), the result is rounded up to the next integer (e.g., 2) so that enough active warps are allocated per execution unit.