[ot][spam][crazy][data] transformer model 'attention' improvement

Tue Feb 1 13:17:21 PST 2022

i'm working on below extant issue atm

also huggingface replied to the PR i made when i was losin' it, and
mentioned two other efficient attention implementations; they looked
approximation-based. also they said their repo is specifically
anti-DRY. which is not something anybody expects to hear. there's at
least one fork of it though.

commit 172ae5d668bec9180516e2238f195b56d11a9799 (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <0xloem at gmail.com>
Date:   Tue Feb 1 20:47:43 2022 +0000

    removed cruft and added memory-efficient-attention dependency. a
remaining issue exists where masks and biases still allocate O(n^2)
memory.