[ot][spam][crazy][data] transformer model 'attention' improvement
Undiscussed Horrific Abuse, One Victim & Survivor of Many
gmkarl at gmail.com
Tue Feb 1 13:17:21 PST 2022
i'm working on below extant issue atm
also huggingface replied to the PR i made when i was losin' it, and
mentioned two other efficient attention implementations; they looked
approximation-based. also they said their repo is specifically
anti-DRY. which is not something anybody expects to hear. there's at
least one fork of it though.
commit 172ae5d668bec9180516e2238f195b56d11a9799 (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <0xloem at gmail.com>
Date: Tue Feb 1 20:47:43 2022 +0000
removed cruft and added memory-efficient-attention dependency. a
remaining issue exists where masks and biases still allocate O(n^2)
memory.
More information about the cypherpunks
mailing list