[ot[[spam][crazy] tiny decision tranaformer lib
Undescribed Horrific Abuse, One Victim & Survivor of Many
gmkarl at gmail.com
Thu Feb 16 13:24:17 PST 2023
thinking a simple environment interface could be an encapsulation into
a class or member function that doesn't need to be subclassed.
a single function could take e.g. a vector of observation and a
vector? of possible actions and a vector of rewards and return a new
one to perform
thinking a little of unifying agent behavior and history collection
history <- big mess of sequential observations, taken actions, and
rewards, analogous to dataset or dataloader
behavior <- one or more parallel observations, available actions, and
previous rewards, selects action to take
if i wanted to unify them i could consider letting behavior process a
sequence it had not participated in, or i could be off on a wonky
silly concept
something interesting here is i think the dt can take a number of past
events as input
we could forward that interface to the environment. it seems limiting
to ask a user to always reforward history, makes more sense for
something to collect that automatically, but maybe a way to pass in a
batch would make sense for things like training
bleurgh
ideally it is made small to implement
More information about the cypherpunks
mailing list