Skip to content

Conversation

@yongwonshin
Copy link

Motivation

  • Address calculation and offset logic are quite complex to understand.
  • Some transformed thread indices (w_idx, gidx, offset) are to help address mapping, which unnecessarily complicates the code.

Key changes

  • I created addr_gen_s utility function which accepts any offset and column value.
  • Address (in-bank offset) calculation is now flattened following the CUDA convention.

Test

  • All pass OpenCLPimIntegrationTests.

To Reviewers

  • Test passed using OpenCL and NVIDIA GPU, but I'm not sure it would pass in the real PIM hardware.
  • addr_gen_s calls addr_gen_ which adds slight overhead.

- addr_gen_s accepts any size of offset and col
- flatten and inline address calculation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant