thinking a simple environment interface could be an encapsulation into a class or member function that doesn't need to be subclassed. a single function could take e.g. a vector of observation and a vector? of possible actions and a vector of rewards and return a new one to perform thinking a little of unifying agent behavior and history collection history <- big mess of sequential observations, taken actions, and rewards, analogous to dataset or dataloader behavior <- one or more parallel observations, available actions, and previous rewards, selects action to take if i wanted to unify them i could consider letting behavior process a sequence it had not participated in, or i could be off on a wonky silly concept something interesting here is i think the dt can take a number of past events as input we could forward that interface to the environment. it seems limiting to ask a user to always reforward history, makes more sense for something to collect that automatically, but maybe a way to pass in a batch would make sense for things like training bleurgh ideally it is made small to implement