Re: [spam] Non-Canon Traffick Boss Spinoffs Reboot 3

4 Aug 2024


      On Sun, Aug 4, 2024 at 11:51 PM Undescribed Horrific Abuse, One Victim
& Survivor of Many <gmkarl@gmail.com> wrote:
...
so recently zuckerbergcorp released another llama, llama 3.1, they're
really trying to make it big, although i think they talked bigger than
it is, so far
but they released a private-research-scale model, over 400G
parametres, said it took thousands of gpus to train blergh
anyway i have a 2GB gpu still, so i can run a 500K parameter model at
standard floating point precision (or 4G at 4bit).
it's always fun to try to downscale the models, which means navigating
puzzle inhibitions and such
we were thinking, how to run this llama? there's also an 8G parameter
model, so i could run part of it but not all of it, in the gpu
a fun idea seemed like top-k. there was some criticism around it, but
it might work!
the idea would be to keep all parameters on disk but to filter them
somehow such that only ones that are relevant to the data are loaded
in further thought it's an algorith
thinking of taking algorithm-idea gently
some possible parts that might be near considering:
- the first layer's input is fully known because it comes straight
from embeddings, which are easy to partially load [basically a dict of
vectors]
- for top-k, you care about which weights will impact the result: you
can see this inside the attention kernel (really each module will need
its own treatment of some sort, there are like 4 or 5 different kinds
of weights)
- the weights are multiplied by the inputs in a -- [aughhhhhhhhl
...
m challeng-- [AARRRGG

Re: [spam] Non-Canon Traffick Boss Spinoffs Reboot 3

Undescribed Horrific Abuse, One Victim & Survivor of Many