22 Jan
2022
22 Jan
'22
6:40 p.m.
- their example gpu code is based around an attention() function on line 42 that takes the query, key, and value as function parameters, as well as a chunk size. - this engages the concept of 'heads'. i _ think_ a 'head' is basically a chunk of the input data, already, not sure. - their attention() function breaks the query into chunks of the passed size, each chunk associated with all values and all keys, and passes each one to _query_chunk_attention() ...