InCoder: A Generative Model for Code In-Filling and Synthesis

Fri Apr 15 01:47:31 PDT 2022

This is something you should know about, use, and understand how works,
probably in that order.

https://twitter.com/dan_fried/status/1514265047761043456

https://sites.google.com/view/incoder-code-models

InCoder
InCoder: A Generative Model for Code In-Filling and Synthesis
Daniel Fried*,  Armen Aghajanyan*,  Jessy Lin, Sida Wang, Eric Wallace,
Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis

arXiv, 2022

Paper: https://arxiv.org/abs/2204.05999

Demo: https://huggingface.co/spaces/facebook/incoder-demo

Model weights and instructions:
https://github.com/dpfried/incoder/blob/main/README.md

Examples: https://sites.google.com/view/incoder-code-models/home/examples

Inserting and completing code in a single model
We train a generative, decoder-only Transformer using a causal-masking
training objective (from CM3, Aghajanyan et al. 2022) , which trains a
model to generate entire code files in arbitrary orderings via masking.
Here's an example where a single region is masked:

Zero-shot generation for code tasks
In inference, we can prompt our model with a document containing MASK
tokens where we want it to insert code. This lets us perform a plethora of
code tasks without any task-specific fine-tuning, including docstring
generation, type hint prediction, variable renaming, cloze tasks, and more.
Here are real outputs from our model:

See more examples here: Examples

Trained on open-source code and StackOverflow
Unlike past work, our model's training data consists of only
permissively-licensed code (Apache 2.0, MIT, BSD-2 and BSD-3 licensed) from
online sources such as GitHub and GitLab, as well as StackOverflow. We
focus on Python and JavaScript, but include 28 languages in total -- a
total of ~200GB of data (after deduplication, filtering, and
decontamination). See our paper for details.

Demo available
Demo Link: https://huggingface.co/spaces/facebook/incoder-demo

Model available in HuggingFace's Transformers
6.7B parameter version: https://huggingface.co/facebook/incoder-6B

1.3B parameter version: https://huggingface.co/facebook/incoder-1B

See our readme here for instructions on required versions of transformers
and tokenizers, and examples of how to do infilling.

Credits
Thanks to Lucile Saulnier, Leandro von Werra, Nicolas Patry, Suraj Patil,
Omar Sanseviero, and others at HuggingFace for help with the model release,
and to Naman Goyal and Stephen Roller for the code our demo was based on!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 6881 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220415/df854970/attachment.txt>