[ot][spam][crazy][data] transformer model 'attention' improvement

Sun Jan 30 02:22:31 PST 2022

Here's the commit hash.

In my test, all the correct attention weights are 1.0 .  So maybe I'll
run the huge pretrained model through my mempickle project that lets
it run on low-end systems, using this new code i'm writing, and verify
that all the weights are output correctly before opening a pull
request.

commit 84724e1de4721ea0333d6bdbb91e8bce74fbeac2 (HEAD ->
return_weights, origin/return_weights)
Author: xloem <0xloem at gmail.com>
Date:   Sun Jan 30 10:14:37 2022 +0000

    return the post-softmax weights rather than the pre-softmax weights