[spam] [personal] perceiver model notes

Thu Jan 27 09:12:39 PST 2022

edited issue text, i think the flag is called 'output_attentions' or
something, not 'return_weights':

feat: output_attentions #1

I'm looking into hacking some of the models in the transformers
library to use this library for attention, and I don't see a way to
support `output_attentions` yet.  This is a flag passed in
transformers, where the pre-softmax attention weights are preserved
and returned to the user, if it is set.

I looked a little at implementing this in the torch backend, and I
note the scan() function provides for only a single tensor return
value. It seems to me that scan() function would be most clearly
replaced by a for loop, but it could also be modified to handle
tuples, or return_weights could be handled via accessing nonlocal data
in some way instead of returning them through the chunk scanner. I'm
also not sure how the output would best be passed to the user.

I'm thinking it might make the most sense to provide for an optional
output parameter, although I don't really know.