Re: [ot][spam][crazy][data] transformer model 'attention' improvement

30 Jan 2022

      https://github.com/AminRezaei0x443/memory-efficient-attention/pull/2

Provide a flag for the user to receive attention weights

This is my draft code for #1. I saw this feature in the transformers
library and wanted to implement it here.

I'm curious what you think about this feature and implementation.

The code is simply slightly instrumented so that the final attention
weights can be returned to the user. Tests are augmented to test this
use. In utils, the `scan` function is expanded to handle tuples.

A change to `dynamic_slice` crept in from dev, to use slices rather
than index_slice. I've retained this change because it looks like it
would execute faster to me, but it can be removed.

Rebased and squashed from 84724e1de4721ea0333d6bdbb91e8bce74fbeac .

Re: [ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, Victim & Survivor of