Re: [spam][crazy][fiction][random] Non-Canon MCBoss Spinoffs

27 Feb 2025


      ...
https://github.com/karl3wm/httptransformer or maybe c++ or something
deepseek is designed with 5% evaluation size and pretrained speculative decode
so the next step i left was subsharding large weights.
i have a potential bump today so i wanted to mention that subsharding looks pretty easy, one approach is to use torch's __torch_function__ functionality where it can treat any object as a tensor if it has a __torch_function__ function (the examples shows a class function but member functions may work too), and it calls this function (if present) for operations rather than the torch implementations.
very good for embedding layer, a LazyTensor could store the url and offset and calculate and fill only the sparse columns needed for the tokens passed, saving network and memory significantly.

Re: [spam][crazy][fiction][random] Non-Canon MCBoss Spinoffs

karl3＠writeme.com