[ot][spam][crazy][data] transformer model 'attention' improvement

Sun Jan 30 01:51:58 PST 2022

apparently i missent this log the first time

the two main outputs are the same now, but it looks like i implemented
the new 'output_attentions' feature wrong. there's likelihood (hard
for me to tell so far) that it should be the _post_ softmax weights,
not the _pre_ softmax weights as i said in the public issue i opened
to move things forward responsibly.  thinking about the public error
(and engaging issues interacting with my system like the loss of the
first send of this email) can stimulate my spasms, which prolongs the
public presentation of the possibly-false information :rolls_eyes:.

commit 788efe5c9a99cc4b432cc215d0dbb1175632d73a (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <0xloem at gmail.com>
Date:   Sun Jan 30 09:42:20 2022 +0000

    typo fix resolves exception; also missing line in temporary
debugging code. looks like return_attentions is returning the wrong
thing.