InCoder: A Generative Model for Code In-Filling and Synthesis
This is something you should know about, use, and understand how works, probably in that order. https://twitter.com/dan_fried/status/1514265047761043456 https://sites.google.com/view/incoder-code-models InCoder InCoder: A Generative Model for Code In-Filling and Synthesis Daniel Fried*, Armen Aghajanyan*, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis arXiv, 2022 Paper: https://arxiv.org/abs/2204.05999 Demo: https://huggingface.co/spaces/facebook/incoder-demo Model weights and instructions: https://github.com/dpfried/incoder/blob/main/README.md Examples: https://sites.google.com/view/incoder-code-models/home/examples Inserting and completing code in a single model We train a generative, decoder-only Transformer using a causal-masking training objective (from CM3, Aghajanyan et al. 2022) , which trains a model to generate entire code files in arbitrary orderings via masking. Here's an example where a single region is masked: Zero-shot generation for code tasks In inference, we can prompt our model with a document containing MASK tokens where we want it to insert code. This lets us perform a plethora of code tasks without any task-specific fine-tuning, including docstring generation, type hint prediction, variable renaming, cloze tasks, and more. Here are real outputs from our model: See more examples here: Examples Trained on open-source code and StackOverflow Unlike past work, our model's training data consists of only permissively-licensed code (Apache 2.0, MIT, BSD-2 and BSD-3 licensed) from online sources such as GitHub and GitLab, as well as StackOverflow. We focus on Python and JavaScript, but include 28 languages in total -- a total of ~200GB of data (after deduplication, filtering, and decontamination). See our paper for details. Demo available Demo Link: https://huggingface.co/spaces/facebook/incoder-demo Model available in HuggingFace's Transformers 6.7B parameter version: https://huggingface.co/facebook/incoder-6B 1.3B parameter version: https://huggingface.co/facebook/incoder-1B See our readme here for instructions on required versions of transformers and tokenizers, and examples of how to do infilling. Credits Thanks to Lucile Saulnier, Leandro von Werra, Nicolas Patry, Suraj Patil, Omar Sanseviero, and others at HuggingFace for help with the model release, and to Naman Goyal and Stephen Roller for the code our demo was based on!
participants (1)
-
Karl Semich