[ot][spam][crazy] thoughts on copying "authorized" work
i saw a youtube video showing a numpy api interface that converted natural prompts to code. it was cast as general-purpose, but numpy is a well-known api in data science, and large pretrained language models are likely to already have a grasp of it. hence, it's a reasonable starting point
one way to navigate abuse triggers associated with decentralized work could be to plan, for now, to use a hosted model api such as openai, rather than a free model if that provides grounding, the task may simplify to mostly data selection and boilerplate. i'm thinking: - giving a few examples and trying to few-shot it to do the rest - then (_later_) seeing if its output can be used as data augmentation to transfer to e.g. undocumented apis (may involve interim steps over more familiar apis)
the keys here being; - the system provides output focused on an api of user interest. this makes it useful. - the api is one models already know. this makes it easy to make. - the system has already been demo'd as a public work by an engineer who left a spy corp. this makes it a little less terrifying to pursue for me.
Step 1: - collect a couple examples of numpy api usage from the docs or memory - feed these into a simple code system that writes them to a file or database associated with "numpy" - interface them as a prompt to a language model - see how well it does before it needs more prompts than it has context one could pick the examples intelligently using an active learning library but i think it's more productive to then consider moving to adapting or finetuning i'm really slow, so i expect step 1 to be a personal milestone
participants (1)
-
Undescribed Horrific Abuse, One Victim & Survivor of Many