[ml][ot] low-end rlhf ala chatgpt, huggingface
i guess huggingface put themselves in charge of machine learning technology trickling down to the masses in a controlled way they released a lengthy tutorial that explains the parts, on using a new patch to their libraries to make custom large models on low end hardware. it combines the chatgpt rlhf approach with the adapter peft approach and appears to be roughly what is presently the normative cutting edge https://www.reddit.com/r/MachineLearning/comments/11p3a0j/d_finetuning_20b_l... https://huggingface.co/blog/trl-peft
The first thing I would do if I were to hack on this would be to have live training in the background as the model is used, updating the data to reflect user feedback live. This would make the system behave much more like a conventional AI and less like a data science project. It seems to me the simplest way to address various problems that could arise would be to put all the users in a decentralized network with shared access to the same model used for every task, sharing all their feedback data.
participants (1)
-
Undescribed Horrific Abuse, One Victim & Survivor of Many