once upon a time
2204
2204
comets in space chatter abou.
2204
language models are being made of layers. people talk mostly about the attention part of a layer. the attention layer has keys, queries, values. it takes incoming numbers called log probabilities or such and reorganizes them into heads. each head i think participates separately in the interactions with the keys, queries, and values.
the input into an attention kernel is of the form of a batch of sequences of vectors of these numbers. each batch is processed [strong disagreement rises, maybe rooted in my brief observation of accumulation during training [regarding whether batches are independent] it’s a cognitive disruption habit, invasive experience.
these invasive experiences are clustered for me, they aren’t all caused by the same things but their causes are near each other. how much they engage relates to my state of mind on a multi-day timescale. they engage much more when i trigger myself in certain patterns, and go away if i protect myself sufficiently (internally) from a larger set of patterns [this is classified private information >(
2208
2208
whoopsness maybe. 2208