https://github.com/AminRezaei0x443/memory-efficient-attention/pull/2 Provide a flag for the user to receive attention weights This is my draft code for #1. I saw this feature in the transformers library and wanted to implement it here. I'm curious what you think about this feature and implementation. The code is simply slightly instrumented so that the final attention weights can be returned to the user. Tests are augmented to test this use. In utils, the `scan` function is expanded to handle tuples. A change to `dynamic_slice` crept in from dev, to use slices rather than index_slice. I've retained this change because it looks like it would execute faster to me, but it can be removed. Rebased and squashed from 84724e1de4721ea0333d6bdbb91e8bce74fbeac .