KARL - Could you not bring up your alleged health issues in our chat?

Sun Apr 9 05:51:02 PDT 2023

thanks man

of course it is weird to have a message slightly mutated and repeated
back, this isn't exactly clear communication

but yeah it's so sad I didn't see those DMs and disrupted the
community and got banned :/ great experience though! things like this
where you actually figure out what was really going on are really
great for learning to be a better part of things in the future.

sadly the health issue messages appear to also be gone when i rejoin
with another account, which is good reason for me to make screen
recording more accessible

honestly value around sparsification was mentioned in the chat and i
had engaged that hypnotically and was spamming the chat in a weird
state of mind because it was so nice to work on anything at all :) but
this did not really respect the other members :(

i got super far with sparsification learning! here are some basics:

- sparsification or pruning is the action of reducing weights from
models. many machine learning models can be reduced to as little of 3%
of the original size.

- sparsification or pruning can be structured or unstructured

- structured sparsification is where actual rows, columns, blocks, and
components are removed from reshaped matrices and network. this
produces memory and time gains that can be immediately realized with
nearly the same inference code. this is a harder task with reduced
gains due to the gross structures that are removed

- unstructured sparsification is the zeroing of arbitrary values
inside the model. this can remove many more values but requires sparse
matrix inference or optimized compilation of the network to realise
the gains from.

- sparsification or pruning can be trained or one-shot, where one-shot
is faster but doesn't realize as many gains

- there's a curated list of resources at
https://github.com/he-y/Awesome-Pruning and a review paper at
https://arxiv.org/abs/2302.02596

- sparsification is often done simultaneous with training or
finetuning, so there are potential options for distilling knowledge
from e.g. llama 65B onto 7B simultaneous with sparsifying it down to
2B or less for making a very powerful model.  As opposed to knowledge
distillations, there are a handful of new datasets out there based on
GPT 4 that could also be used for tuning instead.

Projects I've looked into a little smidge include:

- sparsegpt where my brief fork is at
https://github.com/xloem/sparsellama . I think this fork will produce
an unstructured sparse llama model that needs compilation to realize
gains, in huggingface transformers format. The approach is one-shot so
the results are not amazing, but they can be generated relatively
quickly and are something to work with.

- only_train_once where my pull request is at
https://github.com/tianyic/only_train_once/pull/14 . More code is
needed for this to sparsify llama, and it takes like 100GB of ram
without more optimizations added.

- the optimal bert surgeon at
https://github.com/neuralmagic/sparseml/tree/main/research/optimal_BERT_surgeon_oBERT
. this is for bert, but I think it could be a great learning
experience to sparsify a bert model and then compile it, to kind of do
the whole pipeline if one hasn't before

- the glow ahead-of-time network compiler at
https://github.com/pytorch/glow/blob/master/docs/AOT.md . this says it
can make models into standalone executable bundles. note that research
papers imply there are security vulnerabilities in the resulting
output. i was considering this as the other half of the test bert
pipeline.

other stuff i don't immediately remember well, of the four projects
above, the first two i spent some time exploring over the past few
days, and the last two are what i was last looking at.