[ot[[spam][crazy] tiny decision tranaformer lib
https://github.com/xloem/datadecisions/blob/main/testcase.py new inhibited puzzle. this lib project rose as incentive for me to share idea steps publicly in the future # write here a way to reproduce the tutorial more usefully and simply # ensure it is easy to make new environments and provide custom data, # without needing to refer to external documentation or many different files
the environment is defined by a Gym class, so i guess we'll want a Gym adapter for some normative environment. it can start simple
re Gym similarity, the huggIngface dt takes a vector of floats for both observation and action so that's simplifiable and i updated it to use a vector of rewards
re custom data, i guess an iterable might do maybe an object that makes the field names clear basically a function to convert to dataset if needed seems gym env challenge is more complex
maybe we could think of usecases to facilitate i'd like to make or find something that helps me navigate dense psychological trigger using logs of my behavior i imagine other usescases might be virtual and physical bots. what would be an easy interface to make such things?
say i could label logs with reward vector then it would be nice i guess to feed the data and reward into the system i'd need to pick some parts
increased inhibition on reinforcement learning so let's focus on making a system that self-improves. that's both already very inhibited and getting much nearer as normal in our culture
basically the same thing there's a log and a metric for success propagates back in time to events the system can influence
thinking a simple environment interface could be an encapsulation into a class or member function that doesn't need to be subclassed. a single function could take e.g. a vector of observation and a vector? of possible actions and a vector of rewards and return a new one to perform thinking a little of unifying agent behavior and history collection history <- big mess of sequential observations, taken actions, and rewards, analogous to dataset or dataloader behavior <- one or more parallel observations, available actions, and previous rewards, selects action to take if i wanted to unify them i could consider letting behavior process a sequence it had not participated in, or i could be off on a wonky silly concept something interesting here is i think the dt can take a number of past events as input we could forward that interface to the environment. it seems limiting to ask a user to always reforward history, makes more sense for something to collect that automatically, but maybe a way to pass in a batch would make sense for things like training bleurgh ideally it is made small to implement
maybe an objectish relating to what just happened, or what the state is: observation, action, reward bleurgh maybe keep it very simple so can be changed
it's funny that the project is to let me implement some of these things but i can now barely think of them the experience hopefully can help with that a little
my behavior limits and skills are different than what i expect the ways that work don't line up with each other where i am used to
basically i could make a lot of classes here but instead i would need something simple and useful simple and useful
a simple thing with this tutorial is connecting the dataset to the model via the trainer that’s basically all it is this step doesn’t even need an environment
parts: (each part _could_ be a general interface, atm this idea seems it can increase difficulty) - get data - transform data to model format - train model,probably with Trainer - save model - load environment - link environment to model - test model succeeds at environment the points most important to abstract are providing data to the model, and defining the environment so environment, and records of action in that environment
i'm starting to recall i was thinking of transforming the data next this lines up with an abstraction point
and to add code to testcase.py i think my worry was if i just put the tutorial into testcase.py it wouldn't be repurposeable after dissociating with possible satisfaction of success
i was wondering if i can pass a flat array as a dataset, might be able to if not i can implement one that takes one in its constructor
participants (1)
-
Undescribed Horrific Abuse, One Victim & Survivor of Many