[ml][ot] low-end rlhf ala chatgpt, huggingface

Sun Mar 12 14:50:12 PDT 2023

The first thing I would do if I were to hack on this would be to have
live training in the background as the model is used, updating the
data to reflect user feedback live. This would make the system behave
much more like a conventional AI and less like a data science project.

It seems to me the simplest way to address various problems that could
arise would be to put all the users in a decentralized network with
shared access to the same model used for every task, sharing all their
feedback data.