[crazy][ai] transformer self-bootstrapping attempt n+1

24 Apr 2024

      Attached is code that trains a model that itself can produce a trained
model in roughly the same time as training the produced model manually
(down to randomness of initialization).

I wanted to keep working it to make something useful, but it's gotten
too tense for me to be near and I am stopping it. This is why it is
messy.

I try to do this on every now and then from year to year. I'm sure
many have already completed it. This is the farthest I've gotten
before abandoning it!

The basis of the theory is that a transformer performs as much
computation as there is length to the input and output sequences, so
you can increase the power by increasing the input and output size,
while keeping the size of the model as smaller. This lets you work
with the entirety of a large model using a smaller one, for example.

A novelty I like in this approach is how I labeled the training
weights by providing additional dimensions of data directly on the
inputs ("encoded" in the source), rather than using position
embeddings. Not sure what this is called, but it seems possibly more
effective than position embeddings to me as the first linear encoder
can learn arbitrary encoding forms but only needs to be as wide as the
model dimension.

Crazy Karl

[crazy][ai] transformer self-bootstrapping attempt n+1

Karl Semich