partial nice idea! sad that only partial

the idea starts with context: some of us have the experience of being manipulated by something like AI without regard to our wellbeing or capacity, with no indication of how to communicate with where-ever the influence comes from.

so, then the idea: let’s train an influence AI on agents that have _limited capacity_ and function poorly and then break if things that work some in the short term are continued for the long term or excessively :D

the AI would follow the familiar pattern of discovering influences that work, and then using them, but then because we are doing the training we would get to train it on the breaking period, and show it that it needs to care for the agents for them to succeed, and nurture them to recover if it broke them, and not repeat breaking any.

one idea is then we could ask everybody to use the model as a baseline to train other models! other ideas exist.

but the idea of building comprehension of capacity and protection of safety and wellness seems nice. sorry if it’s partial or stated badly :s

i imagine an agent that has like a secret capacity float that reduces when it does stuff, you could model it to different degrees, and when too many things are influenced of it the capacity float reaches zero and the agent breaks and can’t do things any more :/

a more complex model could involve the agent recovering capacity in ways that the model has difficulty learning

there are a few ideas, mostly involving simple models, but disagreement around them.
some like modeling non-recovering capacity and showing how the AI then maximizes reward by killing all the agents :s