[ot[[spam][crazy] tiny decision tranaformer lib

Undescribed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Thu Feb 16 13:24:17 PST 2023

thinking a simple environment interface could be an encapsulation into
a class or member function that doesn't need to be subclassed.

a single function could take e.g. a vector of observation and a
vector? of possible actions and a vector of rewards and return a new
one to perform

thinking a little of unifying agent behavior and history collection

history <- big mess of sequential observations, taken actions, and
rewards, analogous to dataset or dataloader
behavior <- one or more parallel observations, available actions, and
previous rewards, selects action to take

if i wanted to unify them i could consider letting behavior process a
sequence it had not participated in, or i could be off on a wonky
silly concept

something interesting here is i think the dt can take a number of past
events as input
we could forward that interface to the environment. it seems limiting
to ask a user to always  reforward history, makes more sense for
something to collect that automatically, but maybe a way to pass in a
batch would make sense for things like training


ideally it is made small to implement

More information about the cypherpunks mailing list