1 Feb
2022
1 Feb
'22
9:17 p.m.
i'm working on below extant issue atm also huggingface replied to the PR i made when i was losin' it, and mentioned two other efficient attention implementations; they looked approximation-based. also they said their repo is specifically anti-DRY. which is not something anybody expects to hear. there's at least one fork of it though. commit 172ae5d668bec9180516e2238f195b56d11a9799 (HEAD -> memory-efficient-attention, xloem/memory-efficient-attention) Author: xloem <0xloem@gmail.com> Date: Tue Feb 1 20:47:43 2022 +0000 removed cruft and added memory-efficient-attention dependency. a remaining issue exists where masks and biases still allocate O(n^2) memory.