thanks man of course it is weird to have a message slightly mutated and repeated back, this isn't exactly clear communication but yeah it's so sad I didn't see those DMs and disrupted the community and got banned :/ great experience though! things like this where you actually figure out what was really going on are really great for learning to be a better part of things in the future. sadly the health issue messages appear to also be gone when i rejoin with another account, which is good reason for me to make screen recording more accessible honestly value around sparsification was mentioned in the chat and i had engaged that hypnotically and was spamming the chat in a weird state of mind because it was so nice to work on anything at all :) but this did not really respect the other members :( i got super far with sparsification learning! here are some basics: - sparsification or pruning is the action of reducing weights from models. many machine learning models can be reduced to as little of 3% of the original size. - sparsification or pruning can be structured or unstructured - structured sparsification is where actual rows, columns, blocks, and components are removed from reshaped matrices and network. this produces memory and time gains that can be immediately realized with nearly the same inference code. this is a harder task with reduced gains due to the gross structures that are removed - unstructured sparsification is the zeroing of arbitrary values inside the model. this can remove many more values but requires sparse matrix inference or optimized compilation of the network to realise the gains from. - sparsification or pruning can be trained or one-shot, where one-shot is faster but doesn't realize as many gains - there's a curated list of resources at https://github.com/he-y/Awesome-Pruning and a review paper at https://arxiv.org/abs/2302.02596 - sparsification is often done simultaneous with training or finetuning, so there are potential options for distilling knowledge from e.g. llama 65B onto 7B simultaneous with sparsifying it down to 2B or less for making a very powerful model. As opposed to knowledge distillations, there are a handful of new datasets out there based on GPT 4 that could also be used for tuning instead. Projects I've looked into a little smidge include: - sparsegpt where my brief fork is at https://github.com/xloem/sparsellama . I think this fork will produce an unstructured sparse llama model that needs compilation to realize gains, in huggingface transformers format. The approach is one-shot so the results are not amazing, but they can be generated relatively quickly and are something to work with. - only_train_once where my pull request is at https://github.com/tianyic/only_train_once/pull/14 . More code is needed for this to sparsify llama, and it takes like 100GB of ram without more optimizations added. - the optimal bert surgeon at https://github.com/neuralmagic/sparseml/tree/main/research/optimal_BERT_surg... . this is for bert, but I think it could be a great learning experience to sparsify a bert model and then compile it, to kind of do the whole pipeline if one hasn't before - the glow ahead-of-time network compiler at https://github.com/pytorch/glow/blob/master/docs/AOT.md . this says it can make models into standalone executable bundles. note that research papers imply there are security vulnerabilities in the resulting output. i was considering this as the other half of the test bert pipeline. other stuff i don't immediately remember well, of the four projects above, the first two i spent some time exploring over the past few days, and the last two are what i was last looking at.