
26 Feb
2025
26 Feb
'25
8:32 p.m.
baggle BAGA ummmmmmmmmmmmmmmmm so. there are language models that ummm produce many tokens at once! maybe these could run more effectively on embedded systems! likely so! ummmmmmmm so if you can produce n tokens at once, then that amortizes the cost of going through all your layers. it would make it cheaper to offload them!