
karl3@writeme.com wrote:
there's some interest in 'downloading only top k items' [this involves looking at the layer algebra [and coming up with ways to identify low-contributing values.
[[we have solved this before possibly/optionally including preprocessing to categorize things
top k is more fun! it seems niftier to make ummmmmmmmm so we've got some input logits. these are probably getting multiplied by a _huge_ matrix. we could technically do a naiveish approach of discarding the parts that are multiplied by values near zero. (we could actually consider that each dot product has large values and small values, and skip all values that are smaller than a percentage of the largest values.) - this works much better if we find a way to clump the mask based on locality :/ since http servers like to send regions of bytes not sparse masks - this is really cool if we make like a bayesian or error-labeled datatype, so instead of 3.4 it's more like 3.4+-0.31 this would give much more useful information at the end but yeah it seems interesting to just try the mask! involves some simple torch kernel algebra