the written text adventure toolkit

welcome to wtat

where are you?
>

> huh? what do you mean?

what is the starting room?
>

> oh um let's call it room 1

you are in room 1.
what is here? are there a[ny exits
2225

one could say one of the dawnings of this era would be that aidungeon was closed source and had absolutely no room graph

i wonder what more there is now
2228
i finally came up with a way to do transformers peer to peer recently but i don’t remember it. maybe i can come up with another.

uhhhh ummmmm
ideas:
- squishing to one layer would increase parallelism
- [oops brain issue

wasn’t expecting brain issue here ! surprised. um.
usually i try to continue and it watches me, placing triggers to make it hard to both continue and repeat. as i keep trying more avenues this gets more thorough.

right now i’m on an old ipad. running a pretrained model would be slow, large, and power hungry

one idea could be squishing a model to one layer, having many peers perform operations in parallel, and combine them. i think it is well-known that that doesn’t work here.

transformer layers have a few combination points where all data is summed or such, i’ve noticed from looking at them. i suspect some of these are needed less than other for inference, don’t really know.

maybe i can look at one and wonder about it more [some complaint maybe relates edited turn of phrase]
2234

i’m wondering if i could make progress on guessing how transformers work enough to consider symbolically swapping depth for width.
2234

i’m looking at hf llama source (2237) the device is functioning poorly and it is difficult to do (2238)

looks like a llama layer is:
x += attention(rmsnorm1(x))
x += mlp(rmsnorm2(x))

so one maybe could think of x as a sum of 3 values: its initial value, its self attention calculation, and its mlp calculation
2243

rmsnorm: mat * (x /  sqrt(mean(x^2)))
# i think it scales the data to have a stddev of 1, and then applies a constant linear transformation (mat) specific to the instance of the call

mlp: down_mat(act_fn(gate_mat)) * up_mat(x)
# gate_mat and up_mat perform dimension stretching via linear transforms whereas down_mat undoes the stretching via another. that is, they are rectangular matrices where down_mat has swapped dimensions

so, act_fn here applies a nonlinearity, ie something threshold based, on a set of properties or metrics that are all linear combinations of x after attention, and the linearly recombines them together with x to create a value to sum into it.

i’m wondering if one might consider this a vector of conditionals of simple arithmetic functions of x, which then add another simple arithmetic function into x  for their instances that evaluate true. i’m thinking of the act_fn relu which i think looks like x = y > 0 ? y : 0 not sure. it might use a different act_fn.

225
:s some of us are holding things next to knowledge that the current pursuit could be solved in public research already with reasonable likelihood. there’s interest in heading off for the night, here we go
2259

—-
0643
that was so much fun the transformer poking!
0643
we’re scared of “multinational cartels”