At first glance, this was a great article.

   On Fri, Apr 8, 2022, 7:52 PM <[1]cypherpunks-request@lists.cpunks.org>
   wrote:

     Send cypherpunks mailing list submissions to
             [2]cypherpunks@lists.cpunks.org
     To subscribe or unsubscribe via the World Wide Web, visit
             [3]https://lists.cpunks.org/mailman/listinfo/cypherpunks
     or, via email, send a message with subject or body 'help' to
             [4]cypherpunks-request@lists.cpunks.org
     You can reach the person managing the list at
             [5]cypherpunks-owner@lists.cpunks.org
     When replying, please edit your Subject line so it is more specific
     than "Re: Contents of cypherpunks digest..."
     Today's Topics:
        1. Re: DALL-E (coderman)
     --------------------------------------------------------------------
     --
     Message: 1
     Date: Fri, 08 Apr 2022 23:50:53 +0000
     From: coderman <[6]coderman@protonmail.com>
     To: coderman <[7]coderman@protonmail.com>
     Cc: "cy\"Cypherpunks" <[8]cypherpunks@cpunks.org>
     Subject: Re: DALL-E
     Message-ID:

     <a9WeFGpr9g422W0Uym9aQZyxT6mqWNzNwLsG6yKqqlD4BLpH6NxuARXLOMvBY8IdZF9
     HMetBKZYGjdH--qJRFZDIWnXdMRQVqr3pmMYVo5I=@[9]protonmail.com>
     Content-Type: text/plain; charset="utf-8"
     DALL·E[1]([10]https://openai.com/blog/dall-e/#fn1)
     We decided to name our model using a portmanteau of the artist
     Salvador Dalí and Pixar’s WALL·E.
     is a 12-billion parameter version
     of[GPT-3]([11]https://arxiv.org/abs/2005.14165) trained to generate
     images from text descriptions, using a dataset of text–image pairs.
     We’ve found that it has a diverse set of capabilities, including
     creating anthropomorphized versions of animals and objects,
     combining unrelated concepts in plausible ways, rendering text, and
     applying transformations to existing images.
     ---------------------------------------------------------------
     Text prompt
     an illustration of a baby daikon radish in a tutu walking a dog
     AI-generated
     images
     Edit prompt or view more images
     Text prompt
     an armchair in the shape of an avocado. . . .
     AI-generated
     images
     Edit prompt or view more images
     Text prompt
     a store front that has the word ‘openai’ written on it. . . .
     AI-generated
     images
     Edit prompt or view more images
     Text & image
     prompt
     the exact same cat on the top as a sketch on the bottom
     AI-generated
     images
     Edit prompt or view more images
     ---------------------------------------------------------------
     GPT-3 showed that language can be used to instruct a large neural
     network to perform a variety of text generation tasks. [Image
     GPT]([12]https://openai.com/blog/image-gpt) showed that the same
     type of neural network can also be used to generate images with high
     fidelity. We extend these findings to show that manipulating visual
     concepts through language is now within reach.
     Overview
     Like GPT-3, DALL·E is a transformer language model. It receives both
     the text and the image as a single stream of data containing up to
     1280 tokens, and is trained using maximum likelihood to generate all
     of the tokens, one after
     another.[2]([13]https://openai.com/blog/dall-e/#fn2)
     A token is any symbol from a discrete vocabulary; for humans, each
     English letter is a token from a 26-letter alphabet. DALL·E’s
     vocabulary has tokens for both text and image concepts.
     Specifically, each image caption is represented using a maximum of
     256 BPE-encoded tokens with a vocabulary size of 16384, and the
     image is represented using 1024 tokens with a vocabulary size of
     8192.
     The images are preprocessed to 256x256 resolution during training.
     Similar to
     VQVAE,[14]([14]https://openai.com/blog/dall-e/#rf14)[15](https://ope
     nai.com/blog/dall-e/#rf15) each image is compressed to a 32x32 grid
     of discrete latent codes using a discrete
     VAE[10]([15]https://openai.com/blog/dall-e/#rf10)[11](https://openai
     .com/blog/dall-e/#rf11) that we pretrained using a continuous
     relaxation.[12]([16]https://openai.com/blog/dall-e/#rf12)[13](https:
     //openai.com/blog/dall-e/#rf13) We found that training using the
     relaxation obviates the need for an explicit codebook, EMA loss, or
     tricks like dead code revival, and can scale up to large vocabulary
     sizes.
     This training procedure allows DALL·E to not only generate an image
     from scratch, but also to regenerate any rectangular region of an
     existing image that extends to the bottom-right corner, in a way
     that is consistent with the text prompt.
     We recognize that work involving generative models has the potential
     for significant, broad societal impacts. In the future, we plan to
     analyze how models like DALL·E relate to societal issues like
     economic impact on certain work processes and professions, the
     potential for bias in the model outputs, and the longer term ethical
     challenges implied by this technology.
     Capabilities
     We find that DALL·E is able to create plausible images for a great
     variety of sentences that explore the compositional structure of
     language. We illustrate this using a series of interactive visuals
     in the next section. The samples shown for each caption in the
     visuals are obtained by taking the top 32 of 512 after reranking
     with [CLIP]([17]https://openai.com/blog/clip/), but we do not use
     any manual cherry-picking, aside from the thumbnails and standalone
     images that appear
     outside.[3]([18]https://openai.com/blog/dall-e/#fn3)
     Further details provided in [a later
     section]([19]https://openai.com/blog/dall-e/#summary).
     Controlling Attributes
     We test DALL·E’s ability to modify several of an object’s
     attributes, as well as the number of times that it appears.
     Click to edit text prompt or view more AI-generated images
     a pentagonal green clock. a green clock in the shape of a pentagon.
     navigatedownwide
     a cube made of porcupine. a cube with the texture of a porcupine.
     navigatedownwide
     a collection of glasses is sitting on a table
     navigatedownwide
     Drawing Multiple Objects
     Simultaneously controlling multiple objects, their attributes, and
     their spatial relationships presents a new challenge. For example,
     consider the phrase “a hedgehog wearing a red hat, yellow gloves,
     blue shirt, and green pants.” To correctly interpret this sentence,
     DALL·E must not only correctly compose each piece of apparel with
     the animal, but also form the associations (hat, red), (gloves,
     yellow), (shirt, blue), and (pants, green) without mixing them
     up.[4]([20]https://openai.com/blog/dall-e/#fn4)
     This task is called variable binding, and has been extensively
     studied in the
     literature.[17]([21]https://openai.com/blog/dall-e/#rf17)[18](https:
     //openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#
     rf19)[20](https://openai.com/blog/dall-e/#rf20)
     We test DALL·E’s ability to do this for relative positioning,
     stacking objects, and controlling multiple attributes.
     a small red block sitting on a large green block
     navigatedownwide
     a stack of 3 cubes. a red cube is on the top, sitting on a green
     cube. the green cube is in the middle, sitting on a blue cube. the
     blue cube is on the bottom.
     navigatedownwide
     an emoji of a baby penguin wearing a blue hat, red gloves, green
     shirt, and yellow pants
     navigatedownwide
     While DALL·E does offer some level of controllability over the
     attributes and positions of a small number of objects, the success
     rate can depend on how the caption is phrased. As more objects are
     introduced, DALL·E is prone to confusing the associations between
     the objects and their colors, and the success rate decreases
     sharply. We also note that DALL·E is brittle with respect to
     rephrasing of the caption in these scenarios: alternative,
     semantically equivalent captions often yield no correct
     interpretations.
     Visualizing Perspective and Three-Dimensionality
     We find that DALL·E also allows for control over the viewpoint of a
     scene and the 3D style in which a scene is rendered.
     an extreme close-up view of a capybara sitting in a field
     navigatedownwide
     a capybara made of voxels sitting in a field
     navigatedownwide
     To push this further, we test DALL·E’s ability to repeatedly draw
     the head of a well-known figure at each angle from a sequence of
     equally spaced angles, and find that we can recover a smooth
     animation of the rotating head.
     a photograph of a bust of homer
     navigatedownwide
     DALL·E appears to be able to apply some types of optical distortions
     to scenes, as we see with the options “fisheye lens view” and “a
     spherical panorama.” This motivated us to explore its ability to
     generate reflections.
     a plain white cube looking at its own reflection in a mirror. a
     plain white cube gazing at itself in a mirror.
     navigatedownwide
     Visualizing Internal and External Structure
     The samples from the “extreme close-up view” and “x-ray” style led
     us to further explore DALL·E’s ability to render internal structure
     with cross-sectional views, and external structure with macro
     photographs.
     a cross-section view of a walnut
     navigatedownwide
     a macro photograph of brain coral
     navigatedownwide
     Inferring Contextual Details
     The task of translating text to images is underspecified: a single
     caption generally corresponds to an infinitude of plausible images,
     so the image is not uniquely determined. For instance, consider the
     caption “a painting of a capybara sitting on a field at sunrise.”
     Depending on the orientation of the capybara, it may be necessary to
     draw a shadow, though this detail is never mentioned explicitly. We
     explore DALL·E’s ability to resolve underspecification in three
     cases: changing style, setting, and time; drawing the same object in
     a variety of different situations; and generating an image of an
     object with specific text written on it.
     a painting of a capybara sitting in a field at sunrise
     navigatedownwide
     a stained glass window with an image of a blue strawberry
     navigatedownwide
     a store front that has the word ‘openai’ written on it. a store
     front that has the word ‘openai’ written on it. a store front that
     has the word ‘openai’ written on it. ‘openai’ store front.
     navigatedownwide
     With varying degrees of reliability, DALL·E provides access to a
     subset of the capabilities of a 3D rendering engine via natural
     language. It can independently control the attributes of a small
     number of objects, and to a limited extent, how many there are, and
     how they are arranged with respect to one another. It can also
     control the location and angle from which a scene is rendered, and
     can generate known objects in compliance with precise specifications
     of angle and lighting conditions.
     Unlike a 3D rendering engine, whose inputs must be specified
     unambiguously and in complete detail, DALL·E is often able to “fill
     in the blanks” when the caption implies that the image must contain
     a certain detail that is not explicitly stated.
     Applications of Preceding Capabilities
     Next, we explore the use of the preceding capabilities for fashion
     and interior design.
     a male mannequin dressed in an orange and black flannel shirt
     navigatedownwide
     a female mannequin dressed in a black leather jacket and gold
     pleated skirt
     navigatedownwide
     a living room with two white armchairs and a painting of the
     colosseum. the painting is mounted above a modern fireplace.
     navigatedownwide
     a loft bedroom with a white bed next to a nightstand. there is a
     fish tank beside the bed.
     navigatedownwide
     Combining Unrelated Concepts
     The compositional nature of language allows us to put together
     concepts to describe both real and imaginary things. We find that
     DALL·E also has the ability to combine disparate ideas to synthesize
     objects, some of which are unlikely to exist in the real world. We
     explore this ability in two instances: transferring qualities from
     various concepts to animals, and designing products by taking
     inspiration from unrelated concepts.
     a snail made of harp. a snail with the texture of a harp.
     navigatedownwide
     an armchair in the shape of an avocado. an armchair imitating an
     avocado.
     navigatedownwide
     Animal Illustrations
     In the previous section, we explored DALL·E’s ability to combine
     unrelated concepts when generating images of real-world objects.
     Here, we explore this ability in the context of art, for three kinds
     of illustrations: anthropomorphized versions of animals and objects,
     animal chimeras, and emojis.
     an illustration of a baby daikon radish in a tutu walking a dog
     navigatedownwide
     a professional high quality illustration of a giraffe turtle
     chimera. a giraffe imitating a turtle. a giraffe made of turtle.
     navigatedownwide
     a professional high quality emoji of a lovestruck cup of boba
     navigatedownwide
     Zero-Shot Visual Reasoning
     GPT-3 can be instructed to perform many kinds of tasks solely from a
     description and a cue to generate the answer supplied in its prompt,
     without any additional training. For example, when prompted with the
     phrase “here is the sentence ‘a person walking his dog in the park’
     translated into French:”, GPT-3 answers “un homme qui promène son
     chien dans le parc.” This capability is called zero-shot reasoning.
     We find that DALL·E extends this capability to the visual domain,
     and is able to perform several kinds of image-to-image translation
     tasks when prompted in the right way.
     the exact same cat on the top as a sketch on the bottom
     navigatedownwide
     the exact same teapot on the top with ’gpt’ written on it on the
     bottom
     navigatedownwide
     We did not anticipate that this capability would emerge, and made no
     modifications to the neural network or training procedure to
     encourage it. Motivated by these results, we measure DALL·E’s
     aptitude for analogical reasoning problems by testing it on Raven’s
     progressive matrices, a visual IQ test that saw widespread use in
     the 20th century.
     a sequence of geometric shapes.
     navigatedownwide
     Geographic Knowledge
     We find that DALL·E has learned about geographic facts, landmarks,
     and neighborhoods. Its knowledge of these concepts is surprisingly
     precise in some ways and flawed in others.
     a photo of the food of china
     navigatedownwide
     a photo of alamo square, san francisco, from a street at night
     navigatedownwide
     a photo of san francisco’s golden gate bridge
     navigatedownwide
     Temporal Knowledge
     In addition to exploring DALL·E’s knowledge of concepts that vary
     over space, we also explore its knowledge of concepts that vary over
     time.
     a photo of a phone from the 20s
     navigatedownwide
     Summary of Approach and Prior Work
     DALL·E is a simple decoder-only transformer that receives both the
     text and the image as a single stream of 1280 tokens—256 for the
     text and 1024 for the image—and models all of them autoregressively.
     The attention mask at each of its 64 self-attention layers allows
     each image token to attend to all text tokens. DALL·E uses the
     standard causal mask for the text tokens, and sparse attention for
     the image tokens with either a row, column, or convolutional
     attention pattern, depending on the layer. We provide more details
     about the architecture and training procedure in our
     [paper]([22]https://arxiv.org/abs/2102.12092).
     Text-to-image synthesis has been an active area of research since
     the pioneering work of Reed et.
     al,[1]([23]https://openai.com/blog/dall-e/#rf1) whose approach uses
     a GAN conditioned on text embeddings. The embeddings are produced by
     an encoder pretrained using a contrastive loss, not unlike CLIP.
     StackGAN[3]([24]https://openai.com/blog/dall-e/#rf3) and
     StackGAN++[4]([25]https://openai.com/blog/dall-e/#rf4) use
     multi-scale GANs to scale up the image resolution and improve visual
     fidelity. AttnGAN[5]([26]https://openai.com/blog/dall-e/#rf5)
     incorporates attention between the text and image features, and
     proposes a contrastive text-image feature matching loss as an
     auxiliary objective. This is interesting to compare to our reranking
     with CLIP, which is done offline. Other
     work[2]([27]https://openai.com/blog/dall-e/#rf2)[6](https://openai.c
     om/blog/dall-e/#rf6)[7](https://openai.com/blog/dall-e/#rf7)
     incorporates additional sources of supervision during training to
     improve image quality. Finally, work by Nguyen et.
     al[8]([28]https://openai.com/blog/dall-e/#rf8) and Cho et.
     al[9]([29]https://openai.com/blog/dall-e/#rf9) explores
     sampling-based strategies for image generation that leverage
     pretrained multimodal discriminative models.
     Similar to the rejection sampling used in
     [VQVAE-2]([30]https://arxiv.org/abs/1906.00446), we use
     [CLIP]([31]https://openai.com/blog/clip/) to rerank the top 32 of
     512 samples for each caption in all of the interactive visuals. This
     procedure can also be seen as a kind of language-guided
     search[16]([32]https://openai.com/blog/dall-e/#rf16), and can have a
     dramatic impact on sample quality.
     an illustration of a baby daikon radish in a tutu walking a dog
     [caption 1, best 8 of 2048]
     navigatedownwide----------------------------------------------------
     -----------
     Footnotes
     -
     We decided to name our model using a portmanteau of the artist
     Salvador Dalí and Pixar’s WALL·E.
     [↩︎]([33]https://openai.com/blog/dall-e/#fnref1)
     -
     A token is any symbol from a discrete vocabulary; for humans, each
     English letter is a token from a 26-letter alphabet. DALL·E’s
     vocabulary has tokens for both text and image concepts.
     Specifically, each image caption is represented using a maximum of
     256 BPE-encoded tokens with a vocabulary size of 16384, and the
     image is represented using 1024 tokens with a vocabulary size of
     8192.
     The images are preprocessed to 256x256 resolution during training.
     Similar to
     VQVAE,[14]([34]https://openai.com/blog/dall-e/#rf14)[15](https://ope
     nai.com/blog/dall-e/#rf15) each image is compressed to a 32x32 grid
     of discrete latent codes using a discrete
     VAE[10]([35]https://openai.com/blog/dall-e/#rf10)[11](https://openai
     .com/blog/dall-e/#rf11) that we pretrained using a continuous
     relaxation.[12]([36]https://openai.com/blog/dall-e/#rf12)[13](https:
     //openai.com/blog/dall-e/#rf13) We found that training using the
     relaxation obviates the need for an explicit codebook, EMA loss, or
     tricks like dead code revival, and can scale up to large vocabulary
     sizes. [↩︎]([37]https://openai.com/blog/dall-e/#fnref2)
     -
     Further details provided in [a later
     section]([38]https://openai.com/blog/dall-e/#summary).
     [↩︎]([39]https://openai.com/blog/dall-e/#fnref3)
     -
     This task is called variable binding, and has been extensively
     studied in the
     literature.[17]([40]https://openai.com/blog/dall-e/#rf17)[18](https:
     //openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#
     rf19)[20](https://openai.com/blog/dall-e/#rf20)
     [↩︎]([41]https://openai.com/blog/dall-e/#fnref4)
     ---------------------------------------------------------------
     References
     - Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.
     (2016). “[Generative adversarial text to image
     synthesis]([42]https://arxiv.org/abs/1605.05396)”. In ICML 2016.
     [↩︎]([43]https://openai.com/blog/dall-e/#rfref1)
     - Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.
     (2016). “[Learning what and where to
     draw]([44]https://arxiv.org/abs/1610.02454)”. In NIPS 2016.
     [↩︎]([45]https://openai.com/blog/dall-e/#rfref2)
     - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang X., Metaxas,
     D. (2016). “[StackGAN: Text to photo-realistic image synthesis with
     stacked generative adversarial
     networks]([46]https://arxiv.org/abs/1612.03242)”. In ICCY 2017.
     [↩︎]([47]https://openai.com/blog/dall-e/#rfref3)
     - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X.,
     Metaxas, D. (2017). “[StackGAN++: realistic image synthesis with
     stacked generative adversarial
     networks]([48]https://arxiv.org/abs/1710.10916)”. In IEEE TPAMI
     2018. [↩︎]([49]https://openai.com/blog/dall-e/#rfref4)
     - Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He,
     X. (2017). “[AttnGAN: Fine-grained text to image generation with
     attentional generative adversarial
     networks]([50]https://arxiv.org/abs/1711.10485).
     [↩︎]([51]https://openai.com/blog/dall-e/#rfref5)
     - Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., Gao, J.
     (2019). “[Object-driven text-to-image synthesis via adversarial
     training]([52]https://arxiv.org/abs/1902.10740)”. In CVPR 2019.
     [↩︎]([53]https://openai.com/blog/dall-e/#rfref6)
     - Koh, J. Y., Baldridge, J., Lee, H., Yang, Y. (2020).
     “[Text-to-image generation grounded by fine-grained user
     attention]([54]https://arxiv.org/abs/2011.03775)”. In WACV 2021.
     [↩︎]([55]https://openai.com/blog/dall-e/#rfref7)
     - Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.
     (2016). “[Plug & play generative networks: conditional iterative
     generation of images in latent
     space]([56]https://arxiv.org/abs/1612.00005).
     [↩︎]([57]https://openai.com/blog/dall-e/#rfref8)
     - Cho, J., Lu, J., Schwen, D., Hajishirzi, H., Kembhavi, A. (2020).
     “[X-LXMERT: Paint, caption, and answer questions with multi-modal
     transformers]([58]https://arxiv.org/abs/2009.11278)”. EMNLP 2020.
     [↩︎]([59]https://openai.com/blog/dall-e/#rfref9)
     - Kingma, Diederik P., and Max Welling. “[Auto-encoding variational
     bayes]([60]https://arxiv.org/abs/1312.6114).” arXiv preprint (2013).
     [↩︎]([61]https://openai.com/blog/dall-e/#rfref10a)
     [↩︎]([62]https://openai.com/blog/dall-e/#rfref10b)
     - Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra.
     “[Stochastic backpropagation and approximate inference in deep
     generative models]([63]https://arxiv.org/abs/1401.4082).” arXiv
     preprint (2014). [↩︎]([64]https://openai.com/blog/dall-e/#rfref11a)
     [↩︎]([65]https://openai.com/blog/dall-e/#rfref11b)
     - Jang, E., Gu, S., Poole, B. (2016). “[Categorical
     reparametrization with
     Gumbel-softmax]([66]https://arxiv.org/abs/1611.01144)”.
     [↩︎]([67]https://openai.com/blog/dall-e/#rfref12a)
     [↩︎]([68]https://openai.com/blog/dall-e/#rfref12b)
     - Maddison, C., Mnih, A., Teh, Y. W. (2016). “[The Concrete
     distribution: a continuous relaxation of discrete random
     variables]([69]https://arxiv.org/abs/1611.00712)”.
     [↩︎]([70]https://openai.com/blog/dall-e/#rfref13a)
     [↩︎]([71]https://openai.com/blog/dall-e/#rfref13b)
     - van den Oord, A., Vinyals, O., Kavukcuoglu, K. (2017). “[Neural
     discrete representation
     learning]([72]https://arxiv.org/abs/1711.00937)”.
     [↩︎]([73]https://openai.com/blog/dall-e/#rfref14a)
     [↩︎]([74]https://openai.com/blog/dall-e/#rfref14b)
     - Razavi, A., van der Oord, A., Vinyals, O. (2019). “[Generating
     diverse high-fidelity images with
     VQ-VAE-2]([75]https://arxiv.org/abs/1906.00446)”.
     [↩︎]([76]https://openai.com/blog/dall-e/#rfref15a)
     [↩︎]([77]https://openai.com/blog/dall-e/#rfref15b)
     - Andreas, J., Klein, D., Levine, S. (2017). “[Learning with Latent
     Language]([78]https://arxiv.org/abs/1711.00482)”.
     [↩︎]([79]https://openai.com/blog/dall-e/#rfref16)
     - Smolensky, P. (1990). “[Tensor product variable binding and the
     representation of symbolic structures in connectionist
     systems]([80]http://www.lscp.net/persons/dupoux/teaching/AT1_2014/pa
     pers/Smolensky_1990_TensorProductVariableBinding.AI.pdf)”.
     [↩︎]([81]https://openai.com/blog/dall-e/#rfref17a)
     [↩︎]([82]https://openai.com/blog/dall-e/#rfref17b)
     - Plate, T. (1995). “[Holographic reduced representations:
     convolution algebra for compositional distributed
     representations]([83]https://www.ijcai.org/Proceedings/91-1/Papers/0
     06.pdf)”. [↩︎]([84]https://openai.com/blog/dall-e/#rfref18a)
     [↩︎]([85]https://openai.com/blog/dall-e/#rfref18b)
     - Gayler, R. (1998). “[Multiplicative binding, representation
     operators & analogy]([86]http://cogprints.org/502/)”.
     [↩︎]([87]https://openai.com/blog/dall-e/#rfref19a)
     [↩︎]([88]https://openai.com/blog/dall-e/#rfref19b)
     - Kanerva, P. (1997). “[Fully distributed
     representations]([89]http://www.cap-lore.com/RWC97-kanerva.pdf)”.
     [↩︎]([90]https://openai.com/blog/dall-e/#rfref20a)
     [↩︎]([91]https://openai.com/blog/dall-e/#rfref20b)
     ---------------------------------------------------------------
     Authors
     [Aditya Ramesh]([92]https://openai.com/blog/authors/aditya/)[Mikhail
     Pavlov]([93]https://openai.com/blog/authors/mikhail/)[Gabriel
     Goh]([94]https://openai.com/blog/authors/gabriel/)[Scott
     Gray]([95]https://openai.com/blog/authors/scott/)
     (Primary Authors)
     [Mark Chen]([96]https://openai.com/blog/authors/mark/)[Rewon
     Child]([97]https://openai.com/blog/authors/rewon/)[Vedant
     Misra]([98]https://openai.com/blog/authors/vedant/)[Pamela
     Mishkin]([99]https://openai.com/blog/authors/pamela/)[Gretchen
     Krueger]([100]https://openai.com/blog/authors/gretchen/)[Sandhini
     Agarwal]([101]https://openai.com/blog/authors/sandhini/)[Ilya
     Sutskever]([102]https://openai.com/blog/authors/ilya/)
     (Supporting Authors)
     ---------------------------------------------------------------
     Filed Under
     [Research]([103]https://openai.com/blog/tags/research/)[Milestones](
     https://openai.com/blog/tags/milestones/)[Multimodal](https://openai
     .com/blog/tags/multimodal/)
     ---------------------------------------------------------------
     Cover Artwork
     Justin Jay Wang
     ---------------------------------------------------------------
     Acknowledgments
     Thanks to the following for their feedback on this work and
     contributions to this release: Alec Radford, Andrew Mayne, Jeff
     Clune, Ashley Pilipiszyn, Steve Dowling, Jong Wook Kim, Lei Pan,
     Heewoo Jun, John Schulman, Michael Tabatowski, Preetum Nakkiran,
     Jack Clark, Fraser Kelton, Jacob Jackson, Greg Brockman, Wojciech
     Zaremba, Justin Mao-Jones, David Luan, Shantanu Jain, Prafulla
     Dhariwal, Sam Altman, Pranav Shyam, Miles Brundage, Jakub Pachocki,
     and Ryan Lowe.
     ---------------------------------------------------------------
     Contributions
     Aditya Ramesh was the project lead: he developed the approach,
     trained the models, and wrote most of the blog copy.
     Aditya Ramesh, Mikhail Pavlov, and Scott Gray worked together to
     scale up the model to 12 billion parameters, and designed the
     infrastructure used to draw samples from the model.
     Aditya Ramesh, Gabriel Goh, and Justin Jay Wang worked together to
     create the interactive visuals for the blog.
     Mark Chen and Aditya Ramesh created the images for Raven’s
     Progressives Matrices.
     Rewon Child and Vedant Misra assisted in writing the blog.
     Pamela Mishkin, Gretchen Krueger, and Sandhini Agarwal advised on
     broader impacts of the work and assisted in writing the blog.
     Ilya Sutskever oversaw the project and assisted in writing the blog.
     -------------- next part --------------
     A non-text attachment was scrubbed...
     Name: not available
     Type: text/html
     Size: 45019 bytes
     Desc: not available
     URL:
     <[104]https://lists.cpunks.org/pipermail/cypherpunks/attachments/202
     20408/96a4e98c/attachment.txt>
     ------------------------------
     Subject: Digest Footer
     _______________________________________________
     cypherpunks mailing list
     [105]cypherpunks@lists.cpunks.org
     [106]https://lists.cpunks.org/mailman/listinfo/cypherpunks
     ------------------------------
     End of cypherpunks Digest, Vol 106, Issue 93
     ********************************************

References

   1. mailto:cypherpunks-request@lists.cpunks.org
   2. mailto:cypherpunks@lists.cpunks.org
   3. https://lists.cpunks.org/mailman/listinfo/cypherpunks
   4. mailto:cypherpunks-request@lists.cpunks.org
   5. mailto:cypherpunks-owner@lists.cpunks.org
   6. mailto:coderman@protonmail.com
   7. mailto:coderman@protonmail.com
   8. mailto:cypherpunks@cpunks.org
   9. http://protonmail.com/
  10. https://openai.com/blog/dall-e/#fn1
  11. https://arxiv.org/abs/2005.14165
  12. https://openai.com/blog/image-gpt
  13. https://openai.com/blog/dall-e/#fn2
  14. https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15
  15. https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11
  16. https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13
  17. https://openai.com/blog/clip/
  18. https://openai.com/blog/dall-e/#fn3
  19. https://openai.com/blog/dall-e/#summary
  20. https://openai.com/blog/dall-e/#fn4
  21. https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20
  22. https://arxiv.org/abs/2102.12092
  23. https://openai.com/blog/dall-e/#rf1
  24. https://openai.com/blog/dall-e/#rf3
  25. https://openai.com/blog/dall-e/#rf4
  26. https://openai.com/blog/dall-e/#rf5
  27. https://openai.com/blog/dall-e/#rf2)[6](https://openai.com/blog/dall-e/#rf6)[7](https://openai.com/blog/dall-e/#rf7
  28. https://openai.com/blog/dall-e/#rf8
  29. https://openai.com/blog/dall-e/#rf9
  30. https://arxiv.org/abs/1906.00446
  31. https://openai.com/blog/clip/
  32. https://openai.com/blog/dall-e/#rf16
  33. https://openai.com/blog/dall-e/#fnref1
  34. https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15
  35. https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11
  36. https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13
  37. https://openai.com/blog/dall-e/#fnref2
  38. https://openai.com/blog/dall-e/#summary
  39. https://openai.com/blog/dall-e/#fnref3
  40. https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20
  41. https://openai.com/blog/dall-e/#fnref4
  42. https://arxiv.org/abs/1605.05396)
  43. https://openai.com/blog/dall-e/#rfref1
  44. https://arxiv.org/abs/1610.02454)
  45. https://openai.com/blog/dall-e/#rfref2
  46. https://arxiv.org/abs/1612.03242)
  47. https://openai.com/blog/dall-e/#rfref3
  48. https://arxiv.org/abs/1710.10916)
  49. https://openai.com/blog/dall-e/#rfref4
  50. https://arxiv.org/abs/1711.10485
  51. https://openai.com/blog/dall-e/#rfref5
  52. https://arxiv.org/abs/1902.10740)
  53. https://openai.com/blog/dall-e/#rfref6
  54. https://arxiv.org/abs/2011.03775)
  55. https://openai.com/blog/dall-e/#rfref7
  56. https://arxiv.org/abs/1612.00005
  57. https://openai.com/blog/dall-e/#rfref8
  58. https://arxiv.org/abs/2009.11278)
  59. https://openai.com/blog/dall-e/#rfref9
  60. https://arxiv.org/abs/1312.6114).
  61. https://openai.com/blog/dall-e/#rfref10a
  62. https://openai.com/blog/dall-e/#rfref10b
  63. https://arxiv.org/abs/1401.4082).
  64. https://openai.com/blog/dall-e/#rfref11a
  65. https://openai.com/blog/dall-e/#rfref11b
  66. https://arxiv.org/abs/1611.01144)
  67. https://openai.com/blog/dall-e/#rfref12a
  68. https://openai.com/blog/dall-e/#rfref12b
  69. https://arxiv.org/abs/1611.00712)
  70. https://openai.com/blog/dall-e/#rfref13a
  71. https://openai.com/blog/dall-e/#rfref13b
  72. https://arxiv.org/abs/1711.00937)
  73. https://openai.com/blog/dall-e/#rfref14a
  74. https://openai.com/blog/dall-e/#rfref14b
  75. https://arxiv.org/abs/1906.00446)
  76. https://openai.com/blog/dall-e/#rfref15a
  77. https://openai.com/blog/dall-e/#rfref15b
  78. https://arxiv.org/abs/1711.00482)
  79. https://openai.com/blog/dall-e/#rfref16
  80. http://www.lscp.net/persons/dupoux/teaching/AT1_2014/papers/Smolensky_1990_TensorProductVariableBinding.AI.pdf)
  81. https://openai.com/blog/dall-e/#rfref17a
  82. https://openai.com/blog/dall-e/#rfref17b
  83. https://www.ijcai.org/Proceedings/91-1/Papers/006.pdf)
  84. https://openai.com/blog/dall-e/#rfref18a
  85. https://openai.com/blog/dall-e/#rfref18b
  86. http://cogprints.org/502/)
  87. https://openai.com/blog/dall-e/#rfref19a
  88. https://openai.com/blog/dall-e/#rfref19b
  89. http://www.cap-lore.com/RWC97-kanerva.pdf)
  90. https://openai.com/blog/dall-e/#rfref20a
  91. https://openai.com/blog/dall-e/#rfref20b
  92. https://openai.com/blog/authors/aditya/)[Mikhail
  93. https://openai.com/blog/authors/mikhail/)[Gabriel
  94. https://openai.com/blog/authors/gabriel/)[Scott
  95. https://openai.com/blog/authors/scott/
  96. https://openai.com/blog/authors/mark/)[Rewon
  97. https://openai.com/blog/authors/rewon/)[Vedant
  98. https://openai.com/blog/authors/vedant/)[Pamela
  99. https://openai.com/blog/authors/pamela/)[Gretchen
 100. https://openai.com/blog/authors/gretchen/)[Sandhini
 101. https://openai.com/blog/authors/sandhini/)[Ilya
 102. https://openai.com/blog/authors/ilya/
 103. https://openai.com/blog/tags/research/)[Milestones](https://openai.com/blog/tags/milestones/)[Multimodal](https://openai.com/blog/tags/multimodal/
 104. https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/96a4e98c/attachment.txt
 105. mailto:cypherpunks@lists.cpunks.org
 106. https://lists.cpunks.org/mailman/listinfo/cypherpunks