19 Aug
2024
19 Aug
'24
12:39 a.m.
(the weirdness is that the nvidia library uses the NV_PROVIDES_SM_60 macro to polyfill atomicAdd for half2 for old systems starting in cuda 12.2, but there is no analogous polyfill for single halves, which would have the same implementation roughly, and it is the _only use of that macro in the entire history_. this is one of the things that was making backports fail to build. (i'm working with autogptq)