cypherpunks Digest, Vol 106, Issue 93

Gunnar Larson g at xny.io
Fri Apr 8 16:54:15 PDT 2022


At first glance, this was a great article.

On Fri, Apr 8, 2022, 7:52 PM <cypherpunks-request at lists.cpunks.org> wrote:

> Send cypherpunks mailing list submissions to
>         cypherpunks at lists.cpunks.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.cpunks.org/mailman/listinfo/cypherpunks
> or, via email, send a message with subject or body 'help' to
>         cypherpunks-request at lists.cpunks.org
>
> You can reach the person managing the list at
>         cypherpunks-owner at lists.cpunks.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of cypherpunks digest..."
>
>
> Today's Topics:
>
>    1. Re: DALL-E (coderman)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 08 Apr 2022 23:50:53 +0000
> From: coderman <coderman at protonmail.com>
> To: coderman <coderman at protonmail.com>
> Cc: "cy\"Cypherpunks" <cypherpunks at cpunks.org>
> Subject: Re: DALL-E
> Message-ID:
>
> <a9WeFGpr9g422W0Uym9aQZyxT6mqWNzNwLsG6yKqqlD4BLpH6NxuARXLOMvBY8IdZF9HMetBKZYGjdH--qJRFZDIWnXdMRQVqr3pmMYVo5I=@
> protonmail.com>
>
> Content-Type: text/plain; charset="utf-8"
>
> DALL·E[1](https://openai.com/blog/dall-e/#fn1)
>
> We decided to name our model using a portmanteau of the artist Salvador
> Dalí and Pixar’s WALL·E.
>
> is a 12-billion parameter version of[GPT-3](
> https://arxiv.org/abs/2005.14165) trained to generate images from text
> descriptions, using a dataset of text–image pairs. We’ve found that it has
> a diverse set of capabilities, including creating anthropomorphized
> versions of animals and objects, combining unrelated concepts in plausible
> ways, rendering text, and applying transformations to existing images.
>
> ---------------------------------------------------------------
>
> Text prompt
> an illustration of a baby daikon radish in a tutu walking a dog
> AI-generated
> images
>
> Edit prompt or view more images
> Text prompt
> an armchair in the shape of an avocado. . . .
> AI-generated
> images
>
> Edit prompt or view more images
> Text prompt
> a store front that has the word ‘openai’ written on it. . . .
> AI-generated
> images
>
> Edit prompt or view more images
> Text & image
> prompt
> the exact same cat on the top as a sketch on the bottom
> AI-generated
> images
>
> Edit prompt or view more images
> ---------------------------------------------------------------
>
> GPT-3 showed that language can be used to instruct a large neural network
> to perform a variety of text generation tasks. [Image GPT](
> https://openai.com/blog/image-gpt) showed that the same type of neural
> network can also be used to generate images with high fidelity. We extend
> these findings to show that manipulating visual concepts through language
> is now within reach.
>
> Overview
>
> Like GPT-3, DALL·E is a transformer language model. It receives both the
> text and the image as a single stream of data containing up to 1280 tokens,
> and is trained using maximum likelihood to generate all of the tokens, one
> after another.[2](https://openai.com/blog/dall-e/#fn2)
>
> A token is any symbol from a discrete vocabulary; for humans, each English
> letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has tokens
> for both text and image concepts. Specifically, each image caption is
> represented using a maximum of 256 BPE-encoded tokens with a vocabulary
> size of 16384, and the image is represented using 1024 tokens with a
> vocabulary size of 8192.
>
> The images are preprocessed to 256x256 resolution during training. Similar
> to VQVAE,[14](
> https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15)
> each image is compressed to a 32x32 grid of discrete latent codes using a
> discrete VAE[10](
> https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11)
> that we pretrained using a continuous relaxation.[12](
> https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13)
> We found that training using the relaxation obviates the need for an
> explicit codebook, EMA loss, or tricks like dead code revival, and can
> scale up to large vocabulary sizes.
>
> This training procedure allows DALL·E to not only generate an image from
> scratch, but also to regenerate any rectangular region of an existing image
> that extends to the bottom-right corner, in a way that is consistent with
> the text prompt.
>
> We recognize that work involving generative models has the potential for
> significant, broad societal impacts. In the future, we plan to analyze how
> models like DALL·E relate to societal issues like economic impact on
> certain work processes and professions, the potential for bias in the model
> outputs, and the longer term ethical challenges implied by this technology.
>
> Capabilities
>
> We find that DALL·E is able to create plausible images for a great variety
> of sentences that explore the compositional structure of language. We
> illustrate this using a series of interactive visuals in the next section.
> The samples shown for each caption in the visuals are obtained by taking
> the top 32 of 512 after reranking with [CLIP](
> https://openai.com/blog/clip/), but we do not use any manual
> cherry-picking, aside from the thumbnails and standalone images that appear
> outside.[3](https://openai.com/blog/dall-e/#fn3)
>
> Further details provided in [a later section](
> https://openai.com/blog/dall-e/#summary).
>
> Controlling Attributes
>
> We test DALL·E’s ability to modify several of an object’s attributes, as
> well as the number of times that it appears.
>
> Click to edit text prompt or view more AI-generated images
> a pentagonal green clock. a green clock in the shape of a pentagon.
>
> navigatedownwide
> a cube made of porcupine. a cube with the texture of a porcupine.
>
> navigatedownwide
> a collection of glasses is sitting on a table
>
> navigatedownwide
>
> Drawing Multiple Objects
>
> Simultaneously controlling multiple objects, their attributes, and their
> spatial relationships presents a new challenge. For example, consider the
> phrase “a hedgehog wearing a red hat, yellow gloves, blue shirt, and green
> pants.” To correctly interpret this sentence, DALL·E must not only
> correctly compose each piece of apparel with the animal, but also form the
> associations (hat, red), (gloves, yellow), (shirt, blue), and (pants,
> green) without mixing them up.[4](https://openai.com/blog/dall-e/#fn4)
>
> This task is called variable binding, and has been extensively studied in
> the literature.[17](
> https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20
> )
>
> We test DALL·E’s ability to do this for relative positioning, stacking
> objects, and controlling multiple attributes.
>
> a small red block sitting on a large green block
>
> navigatedownwide
> a stack of 3 cubes. a red cube is on the top, sitting on a green cube. the
> green cube is in the middle, sitting on a blue cube. the blue cube is on
> the bottom.
>
> navigatedownwide
> an emoji of a baby penguin wearing a blue hat, red gloves, green shirt,
> and yellow pants
>
> navigatedownwide
>
> While DALL·E does offer some level of controllability over the attributes
> and positions of a small number of objects, the success rate can depend on
> how the caption is phrased. As more objects are introduced, DALL·E is prone
> to confusing the associations between the objects and their colors, and the
> success rate decreases sharply. We also note that DALL·E is brittle with
> respect to rephrasing of the caption in these scenarios: alternative,
> semantically equivalent captions often yield no correct interpretations.
>
> Visualizing Perspective and Three-Dimensionality
>
> We find that DALL·E also allows for control over the viewpoint of a scene
> and the 3D style in which a scene is rendered.
>
> an extreme close-up view of a capybara sitting in a field
>
> navigatedownwide
> a capybara made of voxels sitting in a field
>
> navigatedownwide
>
> To push this further, we test DALL·E’s ability to repeatedly draw the head
> of a well-known figure at each angle from a sequence of equally spaced
> angles, and find that we can recover a smooth animation of the rotating
> head.
>
> a photograph of a bust of homer
>
> navigatedownwide
>
> DALL·E appears to be able to apply some types of optical distortions to
> scenes, as we see with the options “fisheye lens view” and “a spherical
> panorama.” This motivated us to explore its ability to generate reflections.
>
> a plain white cube looking at its own reflection in a mirror. a plain
> white cube gazing at itself in a mirror.
>
> navigatedownwide
>
> Visualizing Internal and External Structure
>
> The samples from the “extreme close-up view” and “x-ray” style led us to
> further explore DALL·E’s ability to render internal structure with
> cross-sectional views, and external structure with macro photographs.
>
> a cross-section view of a walnut
>
> navigatedownwide
> a macro photograph of brain coral
>
> navigatedownwide
>
> Inferring Contextual Details
>
> The task of translating text to images is underspecified: a single caption
> generally corresponds to an infinitude of plausible images, so the image is
> not uniquely determined. For instance, consider the caption “a painting of
> a capybara sitting on a field at sunrise.” Depending on the orientation of
> the capybara, it may be necessary to draw a shadow, though this detail is
> never mentioned explicitly. We explore DALL·E’s ability to resolve
> underspecification in three cases: changing style, setting, and time;
> drawing the same object in a variety of different situations; and
> generating an image of an object with specific text written on it.
>
> a painting of a capybara sitting in a field at sunrise
>
> navigatedownwide
> a stained glass window with an image of a blue strawberry
>
> navigatedownwide
> a store front that has the word ‘openai’ written on it. a store front that
> has the word ‘openai’ written on it. a store front that has the word
> ‘openai’ written on it. ‘openai’ store front.
>
> navigatedownwide
>
> With varying degrees of reliability, DALL·E provides access to a subset of
> the capabilities of a 3D rendering engine via natural language. It can
> independently control the attributes of a small number of objects, and to a
> limited extent, how many there are, and how they are arranged with respect
> to one another. It can also control the location and angle from which a
> scene is rendered, and can generate known objects in compliance with
> precise specifications of angle and lighting conditions.
>
> Unlike a 3D rendering engine, whose inputs must be specified unambiguously
> and in complete detail, DALL·E is often able to “fill in the blanks” when
> the caption implies that the image must contain a certain detail that is
> not explicitly stated.
>
> Applications of Preceding Capabilities
>
> Next, we explore the use of the preceding capabilities for fashion and
> interior design.
>
> a male mannequin dressed in an orange and black flannel shirt
>
> navigatedownwide
> a female mannequin dressed in a black leather jacket and gold pleated skirt
>
> navigatedownwide
> a living room with two white armchairs and a painting of the colosseum.
> the painting is mounted above a modern fireplace.
>
> navigatedownwide
> a loft bedroom with a white bed next to a nightstand. there is a fish tank
> beside the bed.
>
> navigatedownwide
>
> Combining Unrelated Concepts
>
> The compositional nature of language allows us to put together concepts to
> describe both real and imaginary things. We find that DALL·E also has the
> ability to combine disparate ideas to synthesize objects, some of which are
> unlikely to exist in the real world. We explore this ability in two
> instances: transferring qualities from various concepts to animals, and
> designing products by taking inspiration from unrelated concepts.
>
> a snail made of harp. a snail with the texture of a harp.
>
> navigatedownwide
> an armchair in the shape of an avocado. an armchair imitating an avocado.
>
> navigatedownwide
>
> Animal Illustrations
>
> In the previous section, we explored DALL·E’s ability to combine unrelated
> concepts when generating images of real-world objects. Here, we explore
> this ability in the context of art, for three kinds of illustrations:
> anthropomorphized versions of animals and objects, animal chimeras, and
> emojis.
>
> an illustration of a baby daikon radish in a tutu walking a dog
>
> navigatedownwide
> a professional high quality illustration of a giraffe turtle chimera. a
> giraffe imitating a turtle. a giraffe made of turtle.
>
> navigatedownwide
> a professional high quality emoji of a lovestruck cup of boba
>
> navigatedownwide
>
> Zero-Shot Visual Reasoning
>
> GPT-3 can be instructed to perform many kinds of tasks solely from a
> description and a cue to generate the answer supplied in its prompt,
> without any additional training. For example, when prompted with the phrase
> “here is the sentence ‘a person walking his dog in the park’ translated
> into French:”, GPT-3 answers “un homme qui promène son chien dans le parc.”
> This capability is called zero-shot reasoning. We find that DALL·E extends
> this capability to the visual domain, and is able to perform several kinds
> of image-to-image translation tasks when prompted in the right way.
>
> the exact same cat on the top as a sketch on the bottom
>
> navigatedownwide
> the exact same teapot on the top with ’gpt’ written on it on the bottom
>
> navigatedownwide
>
> We did not anticipate that this capability would emerge, and made no
> modifications to the neural network or training procedure to encourage it.
> Motivated by these results, we measure DALL·E’s aptitude for analogical
> reasoning problems by testing it on Raven’s progressive matrices, a visual
> IQ test that saw widespread use in the 20th century.
>
> a sequence of geometric shapes.
>
> navigatedownwide
>
> Geographic Knowledge
>
> We find that DALL·E has learned about geographic facts, landmarks, and
> neighborhoods. Its knowledge of these concepts is surprisingly precise in
> some ways and flawed in others.
>
> a photo of the food of china
>
> navigatedownwide
> a photo of alamo square, san francisco, from a street at night
>
> navigatedownwide
> a photo of san francisco’s golden gate bridge
>
> navigatedownwide
>
> Temporal Knowledge
>
> In addition to exploring DALL·E’s knowledge of concepts that vary over
> space, we also explore its knowledge of concepts that vary over time.
>
> a photo of a phone from the 20s
>
> navigatedownwide
>
> Summary of Approach and Prior Work
>
> DALL·E is a simple decoder-only transformer that receives both the text
> and the image as a single stream of 1280 tokens—256 for the text and 1024
> for the image—and models all of them autoregressively. The attention mask
> at each of its 64 self-attention layers allows each image token to attend
> to all text tokens. DALL·E uses the standard causal mask for the text
> tokens, and sparse attention for the image tokens with either a row,
> column, or convolutional attention pattern, depending on the layer. We
> provide more details about the architecture and training procedure in our
> [paper](https://arxiv.org/abs/2102.12092).
>
> Text-to-image synthesis has been an active area of research since the
> pioneering work of Reed et. al,[1](https://openai.com/blog/dall-e/#rf1)
> whose approach uses a GAN conditioned on text embeddings. The embeddings
> are produced by an encoder pretrained using a contrastive loss, not unlike
> CLIP. StackGAN[3](https://openai.com/blog/dall-e/#rf3) and StackGAN++[4](
> https://openai.com/blog/dall-e/#rf4) use multi-scale GANs to scale up the
> image resolution and improve visual fidelity. AttnGAN[5](
> https://openai.com/blog/dall-e/#rf5) incorporates attention between the
> text and image features, and proposes a contrastive text-image feature
> matching loss as an auxiliary objective. This is interesting to compare to
> our reranking with CLIP, which is done offline. Other work[2](
> https://openai.com/blog/dall-e/#rf2)[6](https://openai.com/blog/dall-e/#rf6)[7](https://openai.com/blog/dall-e/#rf7)
> incorporates additional sources of supervision during training to improve
> image quality. Finally, work by Nguyen et. al[8](
> https://openai.com/blog/dall-e/#rf8) and Cho et. al[9](
> https://openai.com/blog/dall-e/#rf9) explores sampling-based strategies
> for image generation that leverage pretrained multimodal discriminative
> models.
>
> Similar to the rejection sampling used in [VQVAE-2](
> https://arxiv.org/abs/1906.00446), we use [CLIP](
> https://openai.com/blog/clip/) to rerank the top 32 of 512 samples for
> each caption in all of the interactive visuals. This procedure can also be
> seen as a kind of language-guided search[16](
> https://openai.com/blog/dall-e/#rf16), and can have a dramatic impact on
> sample quality.
>
> an illustration of a baby daikon radish in a tutu walking a dog [caption
> 1, best 8 of 2048]
>
>
> navigatedownwide---------------------------------------------------------------
>
> Footnotes
>
> -
>
> We decided to name our model using a portmanteau of the artist Salvador
> Dalí and Pixar’s WALL·E. [↩︎](https://openai.com/blog/dall-e/#fnref1)
>
> -
>
> A token is any symbol from a discrete vocabulary; for humans, each English
> letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has tokens
> for both text and image concepts. Specifically, each image caption is
> represented using a maximum of 256 BPE-encoded tokens with a vocabulary
> size of 16384, and the image is represented using 1024 tokens with a
> vocabulary size of 8192.
>
> The images are preprocessed to 256x256 resolution during training. Similar
> to VQVAE,[14](
> https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15)
> each image is compressed to a 32x32 grid of discrete latent codes using a
> discrete VAE[10](
> https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11)
> that we pretrained using a continuous relaxation.[12](
> https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13)
> We found that training using the relaxation obviates the need for an
> explicit codebook, EMA loss, or tricks like dead code revival, and can
> scale up to large vocabulary sizes. [↩︎](
> https://openai.com/blog/dall-e/#fnref2)
>
> -
>
> Further details provided in [a later section](
> https://openai.com/blog/dall-e/#summary). [↩︎](
> https://openai.com/blog/dall-e/#fnref3)
>
> -
>
> This task is called variable binding, and has been extensively studied in
> the literature.[17](
> https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20)
> [↩︎](https://openai.com/blog/dall-e/#fnref4)
>
> ---------------------------------------------------------------
>
> References
>
> - Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.
> (2016). “[Generative adversarial text to image synthesis](
> https://arxiv.org/abs/1605.05396)”. In ICML 2016. [↩︎](
> https://openai.com/blog/dall-e/#rfref1)
>
> - Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H. (2016).
> “[Learning what and where to draw](https://arxiv.org/abs/1610.02454)”. In
> NIPS 2016. [↩︎](https://openai.com/blog/dall-e/#rfref2)
>
> - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang X., Metaxas, D.
> (2016). “[StackGAN: Text to photo-realistic image synthesis with stacked
> generative adversarial networks](https://arxiv.org/abs/1612.03242)”. In
> ICCY 2017. [↩︎](https://openai.com/blog/dall-e/#rfref3)
>
> - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.
> (2017). “[StackGAN++: realistic image synthesis with stacked generative
> adversarial networks](https://arxiv.org/abs/1710.10916)”. In IEEE TPAMI
> 2018. [↩︎](https://openai.com/blog/dall-e/#rfref4)
>
> - Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.
> (2017). “[AttnGAN: Fine-grained text to image generation with attentional
> generative adversarial networks](https://arxiv.org/abs/1711.10485). [↩︎](
> https://openai.com/blog/dall-e/#rfref5)
>
> - Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., Gao, J.
> (2019). “[Object-driven text-to-image synthesis via adversarial training](
> https://arxiv.org/abs/1902.10740)”. In CVPR 2019. [↩︎](
> https://openai.com/blog/dall-e/#rfref6)
>
> - Koh, J. Y., Baldridge, J., Lee, H., Yang, Y. (2020). “[Text-to-image
> generation grounded by fine-grained user attention](
> https://arxiv.org/abs/2011.03775)”. In WACV 2021. [↩︎](
> https://openai.com/blog/dall-e/#rfref7)
>
> - Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J. (2016).
> “[Plug & play generative networks: conditional iterative generation of
> images in latent space](https://arxiv.org/abs/1612.00005). [↩︎](
> https://openai.com/blog/dall-e/#rfref8)
>
> - Cho, J., Lu, J., Schwen, D., Hajishirzi, H., Kembhavi, A. (2020).
> “[X-LXMERT: Paint, caption, and answer questions with multi-modal
> transformers](https://arxiv.org/abs/2009.11278)”. EMNLP 2020. [↩︎](
> https://openai.com/blog/dall-e/#rfref9)
>
> - Kingma, Diederik P., and Max Welling. “[Auto-encoding variational bayes](
> https://arxiv.org/abs/1312.6114).” arXiv preprint (2013). [↩︎](
> https://openai.com/blog/dall-e/#rfref10a) [↩︎](
> https://openai.com/blog/dall-e/#rfref10b)
>
> - Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. “[Stochastic
> backpropagation and approximate inference in deep generative models](
> https://arxiv.org/abs/1401.4082).” arXiv preprint (2014). [↩︎](
> https://openai.com/blog/dall-e/#rfref11a) [↩︎](
> https://openai.com/blog/dall-e/#rfref11b)
>
> - Jang, E., Gu, S., Poole, B. (2016). “[Categorical reparametrization with
> Gumbel-softmax](https://arxiv.org/abs/1611.01144)”. [↩︎](
> https://openai.com/blog/dall-e/#rfref12a) [↩︎](
> https://openai.com/blog/dall-e/#rfref12b)
>
> - Maddison, C., Mnih, A., Teh, Y. W. (2016). “[The Concrete distribution:
> a continuous relaxation of discrete random variables](
> https://arxiv.org/abs/1611.00712)”. [↩︎](
> https://openai.com/blog/dall-e/#rfref13a) [↩︎](
> https://openai.com/blog/dall-e/#rfref13b)
>
> - van den Oord, A., Vinyals, O., Kavukcuoglu, K. (2017). “[Neural discrete
> representation learning](https://arxiv.org/abs/1711.00937)”. [↩︎](
> https://openai.com/blog/dall-e/#rfref14a) [↩︎](
> https://openai.com/blog/dall-e/#rfref14b)
>
> - Razavi, A., van der Oord, A., Vinyals, O. (2019). “[Generating diverse
> high-fidelity images with VQ-VAE-2](https://arxiv.org/abs/1906.00446)”.
> [↩︎](https://openai.com/blog/dall-e/#rfref15a) [↩︎](
> https://openai.com/blog/dall-e/#rfref15b)
>
> - Andreas, J., Klein, D., Levine, S. (2017). “[Learning with Latent
> Language](https://arxiv.org/abs/1711.00482)”. [↩︎](
> https://openai.com/blog/dall-e/#rfref16)
>
> - Smolensky, P. (1990). “[Tensor product variable binding and the
> representation of symbolic structures in connectionist systems](
> http://www.lscp.net/persons/dupoux/teaching/AT1_2014/papers/Smolensky_1990_TensorProductVariableBinding.AI.pdf)”.
> [↩︎](https://openai.com/blog/dall-e/#rfref17a) [↩︎](
> https://openai.com/blog/dall-e/#rfref17b)
>
> - Plate, T. (1995). “[Holographic reduced representations: convolution
> algebra for compositional distributed representations](
> https://www.ijcai.org/Proceedings/91-1/Papers/006.pdf)”. [↩︎](
> https://openai.com/blog/dall-e/#rfref18a) [↩︎](
> https://openai.com/blog/dall-e/#rfref18b)
>
> - Gayler, R. (1998). “[Multiplicative binding, representation operators &
> analogy](http://cogprints.org/502/)”. [↩︎](
> https://openai.com/blog/dall-e/#rfref19a) [↩︎](
> https://openai.com/blog/dall-e/#rfref19b)
>
> - Kanerva, P. (1997). “[Fully distributed representations](
> http://www.cap-lore.com/RWC97-kanerva.pdf)”. [↩︎](
> https://openai.com/blog/dall-e/#rfref20a) [↩︎](
> https://openai.com/blog/dall-e/#rfref20b)
>
> ---------------------------------------------------------------
>
> Authors
> [Aditya Ramesh](https://openai.com/blog/authors/aditya/)[Mikhail Pavlov](
> https://openai.com/blog/authors/mikhail/)[Gabriel Goh](
> https://openai.com/blog/authors/gabriel/)[Scott Gray](
> https://openai.com/blog/authors/scott/)
> (Primary Authors)
> [Mark Chen](https://openai.com/blog/authors/mark/)[Rewon Child](
> https://openai.com/blog/authors/rewon/)[Vedant Misra](
> https://openai.com/blog/authors/vedant/)[Pamela Mishkin](
> https://openai.com/blog/authors/pamela/)[Gretchen Krueger](
> https://openai.com/blog/authors/gretchen/)[Sandhini Agarwal](
> https://openai.com/blog/authors/sandhini/)[Ilya Sutskever](
> https://openai.com/blog/authors/ilya/)
> (Supporting Authors)
> ---------------------------------------------------------------
>
> Filed Under
> [Research](
> https://openai.com/blog/tags/research/)[Milestones](https://openai.com/blog/tags/milestones/)[Multimodal](https://openai.com/blog/tags/multimodal/
> )
> ---------------------------------------------------------------
>
> Cover Artwork
>
> Justin Jay Wang
>
> ---------------------------------------------------------------
>
> Acknowledgments
>
> Thanks to the following for their feedback on this work and contributions
> to this release: Alec Radford, Andrew Mayne, Jeff Clune, Ashley Pilipiszyn,
> Steve Dowling, Jong Wook Kim, Lei Pan, Heewoo Jun, John Schulman, Michael
> Tabatowski, Preetum Nakkiran, Jack Clark, Fraser Kelton, Jacob Jackson,
> Greg Brockman, Wojciech Zaremba, Justin Mao-Jones, David Luan, Shantanu
> Jain, Prafulla Dhariwal, Sam Altman, Pranav Shyam, Miles Brundage, Jakub
> Pachocki, and Ryan Lowe.
>
> ---------------------------------------------------------------
>
> Contributions
>
> Aditya Ramesh was the project lead: he developed the approach, trained the
> models, and wrote most of the blog copy.
>
> Aditya Ramesh, Mikhail Pavlov, and Scott Gray worked together to scale up
> the model to 12 billion parameters, and designed the infrastructure used to
> draw samples from the model.
>
> Aditya Ramesh, Gabriel Goh, and Justin Jay Wang worked together to create
> the interactive visuals for the blog.
>
> Mark Chen and Aditya Ramesh created the images for Raven’s Progressives
> Matrices.
>
> Rewon Child and Vedant Misra assisted in writing the blog.
>
> Pamela Mishkin, Gretchen Krueger, and Sandhini Agarwal advised on broader
> impacts of the work and assisted in writing the blog.
>
> Ilya Sutskever oversaw the project and assisted in writing the blog.
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 45019 bytes
> Desc: not available
> URL: <
> https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/96a4e98c/attachment.txt
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> cypherpunks mailing list
> cypherpunks at lists.cpunks.org
> https://lists.cpunks.org/mailman/listinfo/cypherpunks
>
>
> ------------------------------
>
> End of cypherpunks Digest, Vol 106, Issue 93
> ********************************************
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 39458 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/66867ca0/attachment.txt>


More information about the cypherpunks mailing list