[ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, Victim & Survivor of gmkarl at gmail.com
Sun Jan 30 03:42:14 PST 2022


https://github.com/AminRezaei0x443/memory-efficient-attention/pull/2

Provide a flag for the user to receive attention weights

This is my draft code for #1. I saw this feature in the transformers
library and wanted to implement it here.

I'm curious what you think about this feature and implementation.

The code is simply slightly instrumented so that the final attention
weights can be returned to the user. Tests are augmented to test this
use. In utils, the `scan` function is expanded to handle tuples.

A change to `dynamic_slice` crept in from dev, to use slices rather
than index_slice. I've retained this change because it looks like it
would execute faster to me, but it can be removed.

Rebased and squashed from 84724e1de4721ea0333d6bdbb91e8bce74fbeac .


More information about the cypherpunks mailing list