- for llama 405b it's helpful if film can reach high speeds. Each token
means processing the entire reel of weights, so if it is 12m that means moving the film
I got 12m by calculating 405b weights / 16k logit dimension * 0.05um microfilm resolution (completely ignoring patallel ops -- however many attention heads llama 405b has could divide this, for example), but at least 24m might be more defensive given sampling error and sync data. chatgpt originally estimated 63m. mostly we figured 2m/s was sufficient for now -- it said 5m/s would need industrial-grade parts
at 12m/s for 1 token/s. however, many ops are independent and can be
processed in parallel, as well as multiple rigs can be used to divide the speed, so a goal of 1-5m/s seems workable
of course having a prototype is more useful than further advancement, for inspiration: space/time tradeoffs. rather than advancing the film, weights could be interleaved to present overlapping, and the emitter could change midway to output the next logits before the old weights pass all the way by, using a different portion of the sensor. exchanging batching density (and simplicity) for not needing to engineer for as fast film transport.