Commit 08cfede
committed
Moved local update kernels to separate function which take fewer template params
Removed unncessary template parameters from kernel names submitted by these
functions. As a consequence, the size of `_tensor_accumulation_impl` shared
object reduced from 49'360'152 bytes to 36'422'888, that is, by almost 13MB.1 parent 80f288c commit 08cfede
1 file changed
+216
-251
lines changed
0 commit comments