2022-10-06T08:29+00:00 Last night I started looking into the “alibi”
algorithm within https://github.com/mosaicml/composer , which is a
plugin-based way to mutate the component of transformers that the
milets paper discusses. I’ve forked it to
https://github.com/xloem/composer , presently into a placeholder
branch named “wip”, and have started laying a little space to put
holographic representations in. I’m referring to the source of
for simple and normative ways to implement the core function. It's
quite easy to see the effort as one of simple boilerplate, with these
resources open next to each other.
This morning part of my mind is badly misbehaving, and that’s scary
for me.  I think some of me is terrified about use of machine learning
for oppression. But I think part of me still sees this is very useful.
I have two appointments today that are physically difficult for me to
prepare for.
09:28 I’ve put hrr utility functions into the gpt2 attention
replacement file and changed its attn matmul to be a call to hrr
binding (about 1/3rd of the mutations needed) and pushed to git. I’m
unsure whether to use 2d or 1d calls, but 2d seems a little more
logical, although the paper sums along the sequence dimension so
likely only 1d is needed. I’m also unsure yet how I will test my
implementation: part of the reason for factoring the hrr calls out.
Next step is to learn how gpt tracks the attention mask so as to make
it work for hrr.
0942 I’ve realised to use 1d binding rather than 2d so that token
masks apply correctly. The distraction of this brief incorrectness is
confusing me enough somehow that I’m taking a break to stabilise some.
1204 I’ve started into the second half where the attention output is
produced, and pushed to git. I’m handling a lot of spasmodic thinking,
very hard to stabilise small concepts around the code in my working
memory.  I’m thinking I’ll step away to prepare for my day some.

