- for llama 405b it's helpful if film can reach high speeds. Each token means processing the entire reel of weights, so if it is 12m that means moving the filmI got 12m by calculating 405b weights / 16k logit dimension * 0.05um microfilm resolution (completely ignoring patallel ops -- however many attention heads llama 405b has could divide this, for example), but at least 24m might be more defensive given sampling error and sync data. chatgpt originally estimated 63m. mostly we figured 2m/s was sufficient for now -- it said 5m/s would need industrial-grade partsat 12m/s for 1 token/s. however, many ops are independent and can be processed in parallel, as well as multiple rigs can be used to divide the speed, so a goal of 1-5m/s seems workable