
karl3@writeme.com wrote:
actually it didn't do that i think i prompted it with that token it made totally wrong tokens
it totally worked it completed "Once upon a" -> " time" with only a top_k of 6 the next text was "interconnected.goBack_SECURE" more what i expected :/ well i changed models to athena which i've logged so i could compare the math was correct and it turns out i was making the tensors contiguous incorrectly and my numbers were all wrong i added an extra loop to separate out all the strides, but it then becomes so slow to simply iterate the indices that i haven't seen a single loop complete (although i was doing all 8192 indices in athena to test).
i was thinking that i would step away because a better algorithm informed by pagesize could likely need less index iteration, as well as simply using the tensor strides as opposed to calling a wrapped indexing function for every scalar in the matrix additionally merging the fetch regions would reduce the extensive incomplete loop to a single fetch in this case
another idea is to try a yet smaller model, i've got llama 1B logged for tiny tests
but i spent all day and got so close. it was really cool to forward llama 405b streaming over the internet in just a few seconds. but it wasn't selecting the correct data due to misinterpretation of sparse strides.
an upshot is that it's quite possible those faulty results would look much better with my indexing error fixed this might be a situation where preprocessing the data to change its storage layout (which could also include extrema of the rows and columns of each matrix which would make top_k more effective) would make some sense
man it's so close
i'm slow coding because [my experience left a part of me that rewires me to fight me] and i can have harsh eye jerking issues and amnesia associated with things like changing lines in an editor that i have to continuously navigate :s coding was my biggest skill though (long ago) i wish i had somebody to talk to about it who could kind of confirm that they understood and help learn and navigate triggers