https://github.com/AminRezaei0x443/memory-efficient-attention/issues/1 feat: return_weights #1 xloem opened this issue 36 minutes ago I'm looking into hacking some of the models in the transformers library to use this library for attention, and I don't see a way to support return_weights yet. This is a flag passed in transformers, where the pre-softmax attention weights are preserved and returned to the user, if it is set. I looked a little at implementing this in the torch backend, and I note the scan() function provides for only a single tensor return value. It seems to me that scan() function would be most clearly replaced by a for loop, but it could also be modified to handle tuples, or return_weights could be handled via accessing nonlocal data in some way instead of returning them through the chunk scanner. commit 63fb607e6e0ed41934fbc1e1e150993b261903eb (HEAD -> return_weights, origin/return_weights) Author: xloem <0xloem@gmail.com> Date: Thu Jan 27 15:37:09 2022 +0000 reimplemented using an output parameter. not passing test yet