y'know for some tiem now, after i started being a little able to write cod again, it's been _so_ _so_ much easier to start code than to improve code, or integrate code, etc etc. it partly relates to exposure to things i've associated with issues, etc,
but one of the similarities is that that's also how gpt made code. openai would have it generate code, from start to finish. it talks this way too.

maybe it would be nice to reorganize something like that, so that it actually produces things in a sensical way, rather than starting from the start and calling it good. sadly this is similar to a lot of false starts, but when considered as valuable for how we do things in general, it seems more accessible, a little

1639

1701

well, llamacpp does finetuning etc now. they have an example script that makes checkpoints and loras using adam.
also, tinyllama has released versions, i downloaded a 2-bit quantized one (400-500MB)
i finally got it to make finetuning progress i could see when i set it to a context length of a single token. it said it would complete in 19 minutes. sadly that's a little more time than i have right now.
i did notice these models still use hugely wide matrices for the input and output embeddings, i noticed these are one of the slowest things to train because they don't share information across different tokens. i saw on the [waning feeds?] that there is a new approach to this where the embeddings are not trained with the model, maybe a research paper by meta/facebook, not sure
personally i wouldn't use so much vocab, waste of training energy, i'd maybe do something random like keep a database of near-words and turn the embeddings into two tiny layers