Skip to content

Conversation

@cima22
Copy link
Contributor

@cima22 cima22 commented Aug 27, 2025

Handled mismatch between CUDA and HIP __launch_bounds__ definitions. Performances remain the same. Added GPUCA_CEIL_INT_DIV macro for integer division with ceiling; this ensures that when computing the number of warps (e.g., 1.4), the result is rounded up to the next integer (e.g., 2) so that enough active warps are allocated per execution unit.

@cima22 cima22 requested a review from davidrohr as a code owner August 27, 2025 14:06
@github-actions
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

Copy link
Collaborator

@davidrohr davidrohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I have just one minor comment.
However, as discussed, I would not merge this now, since it can affect the performance. We should merge it once have update performance defaults for MI50 and MI100.
Meanwhile, I'd switch it to a draft PR, so that we do not waste CI resources.

if (par.par_LB_maxThreads[i] > 0) { \
o << "#define GPUCA_LB_" GPUCA_M_STR(name) " " << par.par_LB_maxThreads[i]; \
if (par.par_LB_minBlocks[i] > 0) { \
o << ", " << GPUCA_CEIL_INT_DIV(par.par_LB_maxThreads[i] * par.par_LB_minBlocks[i], (minBlockFactor ? minBlockFactor : par.par_LB_maxThreads[i])); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this as a function to GPUCommonMath.h? I'd prefer to have proper functions instead of macros in C++ code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the function to GPUCommonMath.h makes ROOT complain. Basically it cannot expand GPUd() macros contained in GPUCommonMath.h when using

bash -c "echo -e '#define GPUCA_GPUTYPE_${GPU_ARCH}\\n#define PARAMETER_FILE \"GPUDefParametersDefaults.h\"\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/../Common\");\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/Definitions\");\\ngInterpreter->AddIncludePath(\"${ON_THE_FLY_DIR}\");\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C(\"${PARAMFILE}\")\\n.q\\n'" | root -l -b > /dev/null

For generating the .par files. Example:

/home/gcimador/alice/O2/GPU/GPUTracking/../Common/GPUCommonMath.h:51:3: error: a type specifier is required for all declarations
  GPUd() static float2 MakeFloat2(float x, float y); // TODO: Find better appraoch that is constexpr
  ^

Do you know if there is a quick fix? I am afraid ROOT does not expand correctly the macros here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then just ignore my comment...

@cima22 cima22 marked this pull request as draft August 29, 2025 12:32
@github-actions
Copy link
Contributor

This PR did not have any update in the last 30 days. Is it still needed? Unless further action in will be closed in 5 days.

@github-actions github-actions bot added the stale label Sep 29, 2025
@github-actions github-actions bot closed this Oct 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants