Did Gunnar Larson, rape Mr. Mark Zuckerburg? Or, was it fair game? Finders keepers? On Fri, Apr 8, 2022, 7:56 PM <[1]cypherpunks-request@lists.cpunks.org> wrote: Send cypherpunks mailing list submissions to [2]cypherpunks@lists.cpunks.org To subscribe or unsubscribe via the World Wide Web, visit [3]https://lists.cpunks.org/mailman/listinfo/cypherpunks or, via email, send a message with subject or body 'help' to [4]cypherpunks-request@lists.cpunks.org You can reach the person managing the list at [5]cypherpunks-owner@lists.cpunks.org When replying, please edit your Subject line so it is more specific than "Re: Contents of cypherpunks digest..." Today's Topics: 1. Re: cypherpunks Digest, Vol 106, Issue 93 (Gunnar Larson) -------------------------------------------------------------------- -- Message: 1 Date: Fri, 8 Apr 2022 19:54:15 -0400 From: Gunnar Larson <[6]g@xny.io> To: cypherpunks <[7]cypherpunks@lists.cpunks.org> Subject: Re: cypherpunks Digest, Vol 106, Issue 93 Message-ID: Content-Type: text/plain; charset="utf-8" At first glance, this was a great article. On Fri, Apr 8, 2022, 7:52 PM <[9]cypherpunks-request@lists.cpunks.org> wrote: > Send cypherpunks mailing list submissions to > [10]cypherpunks@lists.cpunks.org > > To subscribe or unsubscribe via the World Wide Web, visit > [11]https://lists.cpunks.org/mailman/listinfo/cypherpunks > or, via email, send a message with subject or body 'help' to > [12]cypherpunks-request@lists.cpunks.org > > You can reach the person managing the list at > [13]cypherpunks-owner@lists.cpunks.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of cypherpunks digest..." > > > Today's Topics: > > 1. Re: DALL-E (coderman) > > > -------------------------------------------------------------------- -- > > Message: 1 > Date: Fri, 08 Apr 2022 23:50:53 +0000 > From: coderman <[14]coderman@protonmail.com> > To: coderman <[15]coderman@protonmail.com> > Cc: "cy\"Cypherpunks" <[16]cypherpunks@cpunks.org> > Subject: Re: DALL-E > Message-ID: > > [17]protonmail.com> > > Content-Type: text/plain; charset="utf-8" > > DALL·E[1]([18]https://openai.com/blog/dall-e/#fn1) > > We decided to name our model using a portmanteau of the artist Salvador > Dalí and Pixar’s WALL·E. > > is a 12-billion parameter version of[GPT-3]( > [19]https://arxiv.org/abs/2005.14165) trained to generate images from text > descriptions, using a dataset of text–image pairs. We’ve found that it has > a diverse set of capabilities, including creating anthropomorphized > versions of animals and objects, combining unrelated concepts in plausible > ways, rendering text, and applying transformations to existing images. > > --------------------------------------------------------------- > > Text prompt > an illustration of a baby daikon radish in a tutu walking a dog > AI-generated > images > > Edit prompt or view more images > Text prompt > an armchair in the shape of an avocado. . . . > AI-generated > images > > Edit prompt or view more images > Text prompt > a store front that has the word ‘openai’ written on it. . . . > AI-generated > images > > Edit prompt or view more images > Text & image > prompt > the exact same cat on the top as a sketch on the bottom > AI-generated > images > > Edit prompt or view more images > --------------------------------------------------------------- > > GPT-3 showed that language can be used to instruct a large neural network > to perform a variety of text generation tasks. [Image GPT]( > [20]https://openai.com/blog/image-gpt) showed that the same type of neural > network can also be used to generate images with high fidelity. We extend > these findings to show that manipulating visual concepts through language > is now within reach. > > Overview > > Like GPT-3, DALL·E is a transformer language model. It receives both the > text and the image as a single stream of data containing up to 1280 tokens, > and is trained using maximum likelihood to generate all of the tokens, one > after another.[2]([21]https://openai.com/blog/dall-e/#fn2) > > A token is any symbol from a discrete vocabulary; for humans, each English > letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has tokens > for both text and image concepts. Specifically, each image caption is > represented using a maximum of 256 BPE-encoded tokens with a vocabulary > size of 16384, and the image is represented using 1024 tokens with a > vocabulary size of 8192. > > The images are preprocessed to 256x256 resolution during training. Similar > to VQVAE,[14]( > [22]https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blo g/dall-e/#rf15) > each image is compressed to a 32x32 grid of discrete latent codes using a > discrete VAE[10]( > [23]https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blo g/dall-e/#rf11) > that we pretrained using a continuous relaxation.[12]( > [24]https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blo g/dall-e/#rf13) > We found that training using the relaxation obviates the need for an > explicit codebook, EMA loss, or tricks like dead code revival, and can > scale up to large vocabulary sizes. > > This training procedure allows DALL·E to not only generate an image from > scratch, but also to regenerate any rectangular region of an existing image > that extends to the bottom-right corner, in a way that is consistent with > the text prompt. > > We recognize that work involving generative models has the potential for > significant, broad societal impacts. In the future, we plan to analyze how > models like DALL·E relate to societal issues like economic impact on > certain work processes and professions, the potential for bias in the model > outputs, and the longer term ethical challenges implied by this technology. > > Capabilities > > We find that DALL·E is able to create plausible images for a great variety > of sentences that explore the compositional structure of language. We > illustrate this using a series of interactive visuals in the next section. > The samples shown for each caption in the visuals are obtained by taking > the top 32 of 512 after reranking with [CLIP]( > [25]https://openai.com/blog/clip/), but we do not use any manual > cherry-picking, aside from the thumbnails and standalone images that appear > outside.[3]([26]https://openai.com/blog/dall-e/#fn3) > > Further details provided in [a later section]( > [27]https://openai.com/blog/dall-e/#summary). > > Controlling Attributes > > We test DALL·E’s ability to modify several of an object’s attributes, as > well as the number of times that it appears. > > Click to edit text prompt or view more AI-generated images > a pentagonal green clock. a green clock in the shape of a pentagon. > > navigatedownwide > a cube made of porcupine. a cube with the texture of a porcupine. > > navigatedownwide > a collection of glasses is sitting on a table > > navigatedownwide > > Drawing Multiple Objects > > Simultaneously controlling multiple objects, their attributes, and their > spatial relationships presents a new challenge. For example, consider the > phrase “a hedgehog wearing a red hat, yellow gloves, blue shirt, and green > pants.” To correctly interpret this sentence, DALL·E must not only > correctly compose each piece of apparel with the animal, but also form the > associations (hat, red), (gloves, yellow), (shirt, blue), and (pants, > green) without mixing them up.[4]([28]https://openai.com/blog/dall-e/#fn4) > > This task is called variable binding, and has been extensively studied in > the literature.[17]( > [29]https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blo g/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https: //openai.com/blog/dall-e/#rf20 > ) > > We test DALL·E’s ability to do this for relative positioning, stacking > objects, and controlling multiple attributes. > > a small red block sitting on a large green block > > navigatedownwide > a stack of 3 cubes. a red cube is on the top, sitting on a green cube. the > green cube is in the middle, sitting on a blue cube. the blue cube is on > the bottom. > > navigatedownwide > an emoji of a baby penguin wearing a blue hat, red gloves, green shirt, > and yellow pants > > navigatedownwide > > While DALL·E does offer some level of controllability over the attributes > and positions of a small number of objects, the success rate can depend on > how the caption is phrased. As more objects are introduced, DALL·E is prone > to confusing the associations between the objects and their colors, and the > success rate decreases sharply. We also note that DALL·E is brittle with > respect to rephrasing of the caption in these scenarios: alternative, > semantically equivalent captions often yield no correct interpretations. > > Visualizing Perspective and Three-Dimensionality > > We find that DALL·E also allows for control over the viewpoint of a scene > and the 3D style in which a scene is rendered. > > an extreme close-up view of a capybara sitting in a field > > navigatedownwide > a capybara made of voxels sitting in a field > > navigatedownwide > > To push this further, we test DALL·E’s ability to repeatedly draw the head > of a well-known figure at each angle from a sequence of equally spaced > angles, and find that we can recover a smooth animation of the rotating > head. > > a photograph of a bust of homer > > navigatedownwide > > DALL·E appears to be able to apply some types of optical distortions to > scenes, as we see with the options “fisheye lens view” and “a spherical > panorama.” This motivated us to explore its ability to generate reflections. > > a plain white cube looking at its own reflection in a mirror. a plain > white cube gazing at itself in a mirror. > > navigatedownwide > > Visualizing Internal and External Structure > > The samples from the “extreme close-up view” and “x-ray” style led us to > further explore DALL·E’s ability to render internal structure with > cross-sectional views, and external structure with macro photographs. > > a cross-section view of a walnut > > navigatedownwide > a macro photograph of brain coral > > navigatedownwide > > Inferring Contextual Details > > The task of translating text to images is underspecified: a single caption > generally corresponds to an infinitude of plausible images, so the image is > not uniquely determined. For instance, consider the caption “a painting of > a capybara sitting on a field at sunrise.” Depending on the orientation of > the capybara, it may be necessary to draw a shadow, though this detail is > never mentioned explicitly. We explore DALL·E’s ability to resolve > underspecification in three cases: changing style, setting, and time; > drawing the same object in a variety of different situations; and > generating an image of an object with specific text written on it. > > a painting of a capybara sitting in a field at sunrise > > navigatedownwide > a stained glass window with an image of a blue strawberry > > navigatedownwide > a store front that has the word ‘openai’ written on it. a store front that > has the word ‘openai’ written on it. a store front that has the word > ‘openai’ written on it. ‘openai’ store front. > > navigatedownwide > > With varying degrees of reliability, DALL·E provides access to a subset of > the capabilities of a 3D rendering engine via natural language. It can > independently control the attributes of a small number of objects, and to a > limited extent, how many there are, and how they are arranged with respect > to one another. It can also control the location and angle from which a > scene is rendered, and can generate known objects in compliance with > precise specifications of angle and lighting conditions. > > Unlike a 3D rendering engine, whose inputs must be specified unambiguously > and in complete detail, DALL·E is often able to “fill in the blanks” when > the caption implies that the image must contain a certain detail that is > not explicitly stated. > > Applications of Preceding Capabilities > > Next, we explore the use of the preceding capabilities for fashion and > interior design. > > a male mannequin dressed in an orange and black flannel shirt > > navigatedownwide > a female mannequin dressed in a black leather jacket and gold pleated skirt > > navigatedownwide > a living room with two white armchairs and a painting of the colosseum. > the painting is mounted above a modern fireplace. > > navigatedownwide > a loft bedroom with a white bed next to a nightstand. there is a fish tank > beside the bed. > > navigatedownwide > > Combining Unrelated Concepts > > The compositional nature of language allows us to put together concepts to > describe both real and imaginary things. We find that DALL·E also has the > ability to combine disparate ideas to synthesize objects, some of which are > unlikely to exist in the real world. We explore this ability in two > instances: transferring qualities from various concepts to animals, and > designing products by taking inspiration from unrelated concepts. > > a snail made of harp. a snail with the texture of a harp. > > navigatedownwide > an armchair in the shape of an avocado. an armchair imitating an avocado. > > navigatedownwide > > Animal Illustrations > > In the previous section, we explored DALL·E’s ability to combine unrelated > concepts when generating images of real-world objects. Here, we explore > this ability in the context of art, for three kinds of illustrations: > anthropomorphized versions of animals and objects, animal chimeras, and > emojis. > > an illustration of a baby daikon radish in a tutu walking a dog > > navigatedownwide > a professional high quality illustration of a giraffe turtle chimera. a > giraffe imitating a turtle. a giraffe made of turtle. > > navigatedownwide > a professional high quality emoji of a lovestruck cup of boba > > navigatedownwide > > Zero-Shot Visual Reasoning > > GPT-3 can be instructed to perform many kinds of tasks solely from a > description and a cue to generate the answer supplied in its prompt, > without any additional training. For example, when prompted with the phrase > “here is the sentence ‘a person walking his dog in the park’ translated > into French:”, GPT-3 answers “un homme qui promène son chien dans le parc.” > This capability is called zero-shot reasoning. We find that DALL·E extends > this capability to the visual domain, and is able to perform several kinds > of image-to-image translation tasks when prompted in the right way. > > the exact same cat on the top as a sketch on the bottom > > navigatedownwide > the exact same teapot on the top with ’gpt’ written on it on the bottom > > navigatedownwide > > We did not anticipate that this capability would emerge, and made no > modifications to the neural network or training procedure to encourage it. > Motivated by these results, we measure DALL·E’s aptitude for analogical > reasoning problems by testing it on Raven’s progressive matrices, a visual > IQ test that saw widespread use in the 20th century. > > a sequence of geometric shapes. > > navigatedownwide > > Geographic Knowledge > > We find that DALL·E has learned about geographic facts, landmarks, and > neighborhoods. Its knowledge of these concepts is surprisingly precise in > some ways and flawed in others. > > a photo of the food of china > > navigatedownwide > a photo of alamo square, san francisco, from a street at night > > navigatedownwide > a photo of san francisco’s golden gate bridge > > navigatedownwide > > Temporal Knowledge > > In addition to exploring DALL·E’s knowledge of concepts that vary over > space, we also explore its knowledge of concepts that vary over time. > > a photo of a phone from the 20s > > navigatedownwide > > Summary of Approach and Prior Work > > DALL·E is a simple decoder-only transformer that receives both the text > and the image as a single stream of 1280 tokens—256 for the text and 1024 > for the image—and models all of them autoregressively. The attention mask > at each of its 64 self-attention layers allows each image token to attend > to all text tokens. DALL·E uses the standard causal mask for the text > tokens, and sparse attention for the image tokens with either a row, > column, or convolutional attention pattern, depending on the layer. We > provide more details about the architecture and training procedure in our > [paper]([30]https://arxiv.org/abs/2102.12092). > > Text-to-image synthesis has been an active area of research since the > pioneering work of Reed et. al,[1]([31]https://openai.com/blog/dall-e/#rf1) > whose approach uses a GAN conditioned on text embeddings. The embeddings > are produced by an encoder pretrained using a contrastive loss, not unlike > CLIP. StackGAN[3]([32]https://openai.com/blog/dall-e/#rf3) and StackGAN++[4]( > [33]https://openai.com/blog/dall-e/#rf4) use multi-scale GANs to scale up the > image resolution and improve visual fidelity. AttnGAN[5]( > [34]https://openai.com/blog/dall-e/#rf5) incorporates attention between the > text and image features, and proposes a contrastive text-image feature > matching loss as an auxiliary objective. This is interesting to compare to > our reranking with CLIP, which is done offline. Other work[2]( > [35]https://openai.com/blog/dall-e/#rf2)[6](https://openai.com/blog/ dall-e/#rf6)[7](https://openai.com/blog/dall-e/#rf7) > incorporates additional sources of supervision during training to improve > image quality. Finally, work by Nguyen et. al[8]( > [36]https://openai.com/blog/dall-e/#rf8) and Cho et. al[9]( > [37]https://openai.com/blog/dall-e/#rf9) explores sampling-based strategies > for image generation that leverage pretrained multimodal discriminative > models. > > Similar to the rejection sampling used in [VQVAE-2]( > [38]https://arxiv.org/abs/1906.00446), we use [CLIP]( > [39]https://openai.com/blog/clip/) to rerank the top 32 of 512 samples for > each caption in all of the interactive visuals. This procedure can also be > seen as a kind of language-guided search[16]( > [40]https://openai.com/blog/dall-e/#rf16), and can have a dramatic impact on > sample quality. > > an illustration of a baby daikon radish in a tutu walking a dog [caption > 1, best 8 of 2048] > > > navigatedownwide---------------------------------------------------- ----------- > > Footnotes > > - > > We decided to name our model using a portmanteau of the artist Salvador > Dalí and Pixar’s WALL·E. [↩︎]([41]https://openai.com/blog/dall-e/#fnref1) > > - > > A token is any symbol from a discrete vocabulary; for humans, each English > letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has tokens > for both text and image concepts. Specifically, each image caption is > represented using a maximum of 256 BPE-encoded tokens with a vocabulary > size of 16384, and the image is represented using 1024 tokens with a > vocabulary size of 8192. > > The images are preprocessed to 256x256 resolution during training. Similar > to VQVAE,[14]( > [42]https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blo g/dall-e/#rf15) > each image is compressed to a 32x32 grid of discrete latent codes using a > discrete VAE[10]( > [43]https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blo g/dall-e/#rf11) > that we pretrained using a continuous relaxation.[12]( > [44]https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blo g/dall-e/#rf13) > We found that training using the relaxation obviates the need for an > explicit codebook, EMA loss, or tricks like dead code revival, and can > scale up to large vocabulary sizes. [↩︎]( > [45]https://openai.com/blog/dall-e/#fnref2) > > - > > Further details provided in [a later section]( > [46]https://openai.com/blog/dall-e/#summary). [↩︎]( > [47]https://openai.com/blog/dall-e/#fnref3) > > - > > This task is called variable binding, and has been extensively studied in > the literature.[17]( > [48]https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blo g/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https: //openai.com/blog/dall-e/#rf20) > [↩︎]([49]https://openai.com/blog/dall-e/#fnref4) > > --------------------------------------------------------------- > > References > > - Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H. > (2016). “[Generative adversarial text to image synthesis]( > [50]https://arxiv.org/abs/1605.05396)”. In ICML 2016. [↩︎]( > [51]https://openai.com/blog/dall-e/#rfref1) > > - Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H. (2016). > “[Learning what and where to draw]([52]https://arxiv.org/abs/1610.02454)”. In > NIPS 2016. [↩︎]([53]https://openai.com/blog/dall-e/#rfref2) > > - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang X., Metaxas, D. > (2016). “[StackGAN: Text to photo-realistic image synthesis with stacked > generative adversarial networks]([54]https://arxiv.org/abs/1612.03242)”. In > ICCY 2017. [↩︎]([55]https://openai.com/blog/dall-e/#rfref3) > > - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D. > (2017). “[StackGAN++: realistic image synthesis with stacked generative > adversarial networks]([56]https://arxiv.org/abs/1710.10916)”. In IEEE TPAMI > 2018. [↩︎]([57]https://openai.com/blog/dall-e/#rfref4) > > - Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X. > (2017). “[AttnGAN: Fine-grained text to image generation with attentional > generative adversarial networks]([58]https://arxiv.org/abs/1711.10485). [↩︎]( > [59]https://openai.com/blog/dall-e/#rfref5) > > - Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., Gao, J. > (2019). “[Object-driven text-to-image synthesis via adversarial training]( > [60]https://arxiv.org/abs/1902.10740)”. In CVPR 2019. [↩︎]( > [61]https://openai.com/blog/dall-e/#rfref6) > > - Koh, J. Y., Baldridge, J., Lee, H., Yang, Y. (2020). “[Text-to-image > generation grounded by fine-grained user attention]( > [62]https://arxiv.org/abs/2011.03775)”. In WACV 2021. [↩︎]( > [63]https://openai.com/blog/dall-e/#rfref7) > > - Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J. (2016). > “[Plug & play generative networks: conditional iterative generation of > images in latent space]([64]https://arxiv.org/abs/1612.00005). [↩︎]( > [65]https://openai.com/blog/dall-e/#rfref8) > > - Cho, J., Lu, J., Schwen, D., Hajishirzi, H., Kembhavi, A. (2020). > “[X-LXMERT: Paint, caption, and answer questions with multi-modal > transformers]([66]https://arxiv.org/abs/2009.11278)”. EMNLP 2020. [↩︎]( > [67]https://openai.com/blog/dall-e/#rfref9) > > - Kingma, Diederik P., and Max Welling. “[Auto-encoding variational bayes]( > [68]https://arxiv.org/abs/1312.6114).” arXiv preprint (2013). [↩︎]( > [69]https://openai.com/blog/dall-e/#rfref10a) [↩︎]( > [70]https://openai.com/blog/dall-e/#rfref10b) > > - Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. “[Stochastic > backpropagation and approximate inference in deep generative models]( > [71]https://arxiv.org/abs/1401.4082).” arXiv preprint (2014). [↩︎]( > [72]https://openai.com/blog/dall-e/#rfref11a) [↩︎]( > [73]https://openai.com/blog/dall-e/#rfref11b) > > - Jang, E., Gu, S., Poole, B. (2016). “[Categorical reparametrization with > Gumbel-softmax]([74]https://arxiv.org/abs/1611.01144)”. [↩︎]( > [75]https://openai.com/blog/dall-e/#rfref12a) [↩︎]( > [76]https://openai.com/blog/dall-e/#rfref12b) > > - Maddison, C., Mnih, A., Teh, Y. W. (2016). “[The Concrete distribution: > a continuous relaxation of discrete random variables]( > [77]https://arxiv.org/abs/1611.00712)”. [↩︎]( > [78]https://openai.com/blog/dall-e/#rfref13a) [↩︎]( > [79]https://openai.com/blog/dall-e/#rfref13b) > > - van den Oord, A., Vinyals, O., Kavukcuoglu, K. (2017). “[Neural discrete > representation learning]([80]https://arxiv.org/abs/1711.00937)”. [↩︎]( > [81]https://openai.com/blog/dall-e/#rfref14a) [↩︎]( > [82]https://openai.com/blog/dall-e/#rfref14b) > > - Razavi, A., van der Oord, A., Vinyals, O. (2019). “[Generating diverse > high-fidelity images with VQ-VAE-2]([83]https://arxiv.org/abs/1906.00446)”. > [↩︎]([84]https://openai.com/blog/dall-e/#rfref15a) [↩︎]( > [85]https://openai.com/blog/dall-e/#rfref15b) > > - Andreas, J., Klein, D., Levine, S. (2017). “[Learning with Latent > Language]([86]https://arxiv.org/abs/1711.00482)”. [↩︎]( > [87]https://openai.com/blog/dall-e/#rfref16) > > - Smolensky, P. (1990). “[Tensor product variable binding and the > representation of symbolic structures in connectionist systems]( > [88]http://www.lscp.net/persons/dupoux/teaching/AT1_2014/papers/Smol ensky_1990_TensorProductVariableBinding.AI.pdf)”. > [↩︎]([89]https://openai.com/blog/dall-e/#rfref17a) [↩︎]( > [90]https://openai.com/blog/dall-e/#rfref17b) > > - Plate, T. (1995). “[Holographic reduced representations: convolution > algebra for compositional distributed representations]( > [91]https://www.ijcai.org/Proceedings/91-1/Papers/006.pdf)”. [↩︎]( > [92]https://openai.com/blog/dall-e/#rfref18a) [↩︎]( > [93]https://openai.com/blog/dall-e/#rfref18b) > > - Gayler, R. (1998). “[Multiplicative binding, representation operators & > analogy]([94]http://cogprints.org/502/)”. [↩︎]( > [95]https://openai.com/blog/dall-e/#rfref19a) [↩︎]( > [96]https://openai.com/blog/dall-e/#rfref19b) > > - Kanerva, P. (1997). “[Fully distributed representations]( > [97]http://www.cap-lore.com/RWC97-kanerva.pdf)”. [↩︎]( > [98]https://openai.com/blog/dall-e/#rfref20a) [↩︎]( > [99]https://openai.com/blog/dall-e/#rfref20b) > > --------------------------------------------------------------- > > Authors > [Aditya Ramesh]([100]https://openai.com/blog/authors/aditya/)[Mikhail Pavlov]( > [101]https://openai.com/blog/authors/mikhail/)[Gabriel Goh]( > [102]https://openai.com/blog/authors/gabriel/)[Scott Gray]( > [103]https://openai.com/blog/authors/scott/) > (Primary Authors) > [Mark Chen]([104]https://openai.com/blog/authors/mark/)[Rewon Child]( > [105]https://openai.com/blog/authors/rewon/)[Vedant Misra]( > [106]https://openai.com/blog/authors/vedant/)[Pamela Mishkin]( > [107]https://openai.com/blog/authors/pamela/)[Gretchen Krueger]( > [108]https://openai.com/blog/authors/gretchen/)[Sandhini Agarwal]( > [109]https://openai.com/blog/authors/sandhini/)[Ilya Sutskever]( > [110]https://openai.com/blog/authors/ilya/) > (Supporting Authors) > --------------------------------------------------------------- > > Filed Under > [Research]( > [111]https://openai.com/blog/tags/research/)[Milestones](https://ope nai.com/blog/tags/milestones/)[Multimodal](https://openai.com/blog/t ags/multimodal/ > ) > --------------------------------------------------------------- > > Cover Artwork > > Justin Jay Wang > > --------------------------------------------------------------- > > Acknowledgments > > Thanks to the following for their feedback on this work and contributions > to this release: Alec Radford, Andrew Mayne, Jeff Clune, Ashley Pilipiszyn, > Steve Dowling, Jong Wook Kim, Lei Pan, Heewoo Jun, John Schulman, Michael > Tabatowski, Preetum Nakkiran, Jack Clark, Fraser Kelton, Jacob Jackson, > Greg Brockman, Wojciech Zaremba, Justin Mao-Jones, David Luan, Shantanu > Jain, Prafulla Dhariwal, Sam Altman, Pranav Shyam, Miles Brundage, Jakub > Pachocki, and Ryan Lowe. > > --------------------------------------------------------------- > > Contributions > > Aditya Ramesh was the project lead: he developed the approach, trained the > models, and wrote most of the blog copy. > > Aditya Ramesh, Mikhail Pavlov, and Scott Gray worked together to scale up > the model to 12 billion parameters, and designed the infrastructure used to > draw samples from the model. > > Aditya Ramesh, Gabriel Goh, and Justin Jay Wang worked together to create > the interactive visuals for the blog. > > Mark Chen and Aditya Ramesh created the images for Raven’s Progressives > Matrices. > > Rewon Child and Vedant Misra assisted in writing the blog. > > Pamela Mishkin, Gretchen Krueger, and Sandhini Agarwal advised on broader > impacts of the work and assisted in writing the blog. > > Ilya Sutskever oversaw the project and assisted in writing the blog. > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: text/html > Size: 45019 bytes > Desc: not available > URL: < > [112]https://lists.cpunks.org/pipermail/cypherpunks/attachments/2022 0408/96a4e98c/attachment.txt > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > cypherpunks mailing list > [113]cypherpunks@lists.cpunks.org > [114]https://lists.cpunks.org/mailman/listinfo/cypherpunks > > > ------------------------------ > > End of cypherpunks Digest, Vol 106, Issue 93 > ******************************************** > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 39458 bytes Desc: not available URL: <[115]https://lists.cpunks.org/pipermail/cypherpunks/attachments/202 20408/66867ca0/attachment.txt> ------------------------------ Subject: Digest Footer _______________________________________________ cypherpunks mailing list [116]cypherpunks@lists.cpunks.org [117]https://lists.cpunks.org/mailman/listinfo/cypherpunks ------------------------------ End of cypherpunks Digest, Vol 106, Issue 94 ******************************************** References 1. mailto:cypherpunks-request@lists.cpunks.org 2. mailto:cypherpunks@lists.cpunks.org 3. https://lists.cpunks.org/mailman/listinfo/cypherpunks 4. mailto:cypherpunks-request@lists.cpunks.org 5. mailto:cypherpunks-owner@lists.cpunks.org 6. mailto:g@xny.io 7. mailto:cypherpunks@lists.cpunks.org 8. mailto:TBMcrzLz46uyNHJXV9cg@mail.gmail.com 9. mailto:cypherpunks-request@lists.cpunks.org 10. mailto:cypherpunks@lists.cpunks.org 11. https://lists.cpunks.org/mailman/listinfo/cypherpunks 12. mailto:cypherpunks-request@lists.cpunks.org 13. mailto:cypherpunks-owner@lists.cpunks.org 14. mailto:coderman@protonmail.com 15. mailto:coderman@protonmail.com 16. mailto:cypherpunks@cpunks.org 17. http://protonmail.com/ 18. https://openai.com/blog/dall-e/#fn1 19. https://arxiv.org/abs/2005.14165 20. https://openai.com/blog/image-gpt 21. https://openai.com/blog/dall-e/#fn2 22. https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15 23. https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11 24. https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13 25. https://openai.com/blog/clip/ 26. https://openai.com/blog/dall-e/#fn3 27. https://openai.com/blog/dall-e/#summary 28. https://openai.com/blog/dall-e/#fn4 29. https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20 30. https://arxiv.org/abs/2102.12092 31. https://openai.com/blog/dall-e/#rf1 32. https://openai.com/blog/dall-e/#rf3 33. https://openai.com/blog/dall-e/#rf4 34. https://openai.com/blog/dall-e/#rf5 35. https://openai.com/blog/dall-e/#rf2)[6](https://openai.com/blog/dall-e/#rf6)[7](https://openai.com/blog/dall-e/#rf7 36. https://openai.com/blog/dall-e/#rf8 37. https://openai.com/blog/dall-e/#rf9 38. https://arxiv.org/abs/1906.00446 39. https://openai.com/blog/clip/ 40. https://openai.com/blog/dall-e/#rf16 41. https://openai.com/blog/dall-e/#fnref1 42. https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15 43. https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11 44. https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13 45. https://openai.com/blog/dall-e/#fnref2 46. https://openai.com/blog/dall-e/#summary 47. https://openai.com/blog/dall-e/#fnref3 48. https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20 49. https://openai.com/blog/dall-e/#fnref4 50. https://arxiv.org/abs/1605.05396) 51. https://openai.com/blog/dall-e/#rfref1 52. https://arxiv.org/abs/1610.02454) 53. https://openai.com/blog/dall-e/#rfref2 54. https://arxiv.org/abs/1612.03242) 55. https://openai.com/blog/dall-e/#rfref3 56. https://arxiv.org/abs/1710.10916) 57. https://openai.com/blog/dall-e/#rfref4 58. https://arxiv.org/abs/1711.10485 59. https://openai.com/blog/dall-e/#rfref5 60. https://arxiv.org/abs/1902.10740) 61. https://openai.com/blog/dall-e/#rfref6 62. https://arxiv.org/abs/2011.03775) 63. https://openai.com/blog/dall-e/#rfref7 64. https://arxiv.org/abs/1612.00005 65. https://openai.com/blog/dall-e/#rfref8 66. https://arxiv.org/abs/2009.11278) 67. https://openai.com/blog/dall-e/#rfref9 68. https://arxiv.org/abs/1312.6114). 69. https://openai.com/blog/dall-e/#rfref10a 70. https://openai.com/blog/dall-e/#rfref10b 71. https://arxiv.org/abs/1401.4082). 72. https://openai.com/blog/dall-e/#rfref11a 73. https://openai.com/blog/dall-e/#rfref11b 74. https://arxiv.org/abs/1611.01144) 75. https://openai.com/blog/dall-e/#rfref12a 76. https://openai.com/blog/dall-e/#rfref12b 77. https://arxiv.org/abs/1611.00712) 78. https://openai.com/blog/dall-e/#rfref13a 79. https://openai.com/blog/dall-e/#rfref13b 80. https://arxiv.org/abs/1711.00937) 81. https://openai.com/blog/dall-e/#rfref14a 82. https://openai.com/blog/dall-e/#rfref14b 83. https://arxiv.org/abs/1906.00446) 84. https://openai.com/blog/dall-e/#rfref15a 85. https://openai.com/blog/dall-e/#rfref15b 86. https://arxiv.org/abs/1711.00482) 87. https://openai.com/blog/dall-e/#rfref16 88. http://www.lscp.net/persons/dupoux/teaching/AT1_2014/papers/Smolensky_1990_TensorProductVariableBinding.AI.pdf) 89. https://openai.com/blog/dall-e/#rfref17a 90. https://openai.com/blog/dall-e/#rfref17b 91. https://www.ijcai.org/Proceedings/91-1/Papers/006.pdf) 92. https://openai.com/blog/dall-e/#rfref18a 93. https://openai.com/blog/dall-e/#rfref18b 94. http://cogprints.org/502/) 95. https://openai.com/blog/dall-e/#rfref19a 96. https://openai.com/blog/dall-e/#rfref19b 97. http://www.cap-lore.com/RWC97-kanerva.pdf) 98. https://openai.com/blog/dall-e/#rfref20a 99. https://openai.com/blog/dall-e/#rfref20b 100. https://openai.com/blog/authors/aditya/)[Mikhail 101. https://openai.com/blog/authors/mikhail/)[Gabriel 102. https://openai.com/blog/authors/gabriel/)[Scott 103. https://openai.com/blog/authors/scott/ 104. https://openai.com/blog/authors/mark/)[Rewon 105. https://openai.com/blog/authors/rewon/)[Vedant 106. https://openai.com/blog/authors/vedant/)[Pamela 107. https://openai.com/blog/authors/pamela/)[Gretchen 108. https://openai.com/blog/authors/gretchen/)[Sandhini 109. https://openai.com/blog/authors/sandhini/)[Ilya 110. https://openai.com/blog/authors/ilya/ 111. https://openai.com/blog/tags/research/)[Milestones](https://openai.com/blog/tags/milestones/)[Multimodal](https://openai.com/blog/tags/multimodal/ 112. https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/96a4e98c/attachment.txt 113. mailto:cypherpunks@lists.cpunks.org 114. https://lists.cpunks.org/mailman/listinfo/cypherpunks 115. https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/66867ca0/attachment.txt 116. mailto:cypherpunks@lists.cpunks.org 117. https://lists.cpunks.org/mailman/listinfo/cypherpunks