cypherpunks Digest, Vol 106, Issue 94

Gunnar Larson g at xny.io
Fri Apr 8 17:29:43 PDT 2022


Did Gunnar Larson, rape Mr. Mark Zuckerburg? Or, was it fair game?

Finders keepers?

On Fri, Apr 8, 2022, 7:56 PM <cypherpunks-request at lists.cpunks.org> wrote:

> Send cypherpunks mailing list submissions to
>         cypherpunks at lists.cpunks.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.cpunks.org/mailman/listinfo/cypherpunks
> or, via email, send a message with subject or body 'help' to
>         cypherpunks-request at lists.cpunks.org
>
> You can reach the person managing the list at
>         cypherpunks-owner at lists.cpunks.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of cypherpunks digest..."
>
>
> Today's Topics:
>
>    1. Re: cypherpunks Digest, Vol 106, Issue 93 (Gunnar Larson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 8 Apr 2022 19:54:15 -0400
> From: Gunnar Larson <g at xny.io>
> To: cypherpunks <cypherpunks at lists.cpunks.org>
> Subject: Re: cypherpunks Digest, Vol 106, Issue 93
> Message-ID:
>         <CAPc8xwPsCK2cA3tT1U-wjuV09T5kc=
> TBMcrzLz46uyNHJXV9cg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> At first glance, this was a great article.
>
> On Fri, Apr 8, 2022, 7:52 PM <cypherpunks-request at lists.cpunks.org> wrote:
>
> > Send cypherpunks mailing list submissions to
> >         cypherpunks at lists.cpunks.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.cpunks.org/mailman/listinfo/cypherpunks
> > or, via email, send a message with subject or body 'help' to
> >         cypherpunks-request at lists.cpunks.org
> >
> > You can reach the person managing the list at
> >         cypherpunks-owner at lists.cpunks.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of cypherpunks digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: DALL-E (coderman)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Fri, 08 Apr 2022 23:50:53 +0000
> > From: coderman <coderman at protonmail.com>
> > To: coderman <coderman at protonmail.com>
> > Cc: "cy\"Cypherpunks" <cypherpunks at cpunks.org>
> > Subject: Re: DALL-E
> > Message-ID:
> >
> >
> <a9WeFGpr9g422W0Uym9aQZyxT6mqWNzNwLsG6yKqqlD4BLpH6NxuARXLOMvBY8IdZF9HMetBKZYGjdH--qJRFZDIWnXdMRQVqr3pmMYVo5I=@
> > protonmail.com>
> >
> > Content-Type: text/plain; charset="utf-8"
> >
> > DALL·E[1](https://openai.com/blog/dall-e/#fn1)
> >
> > We decided to name our model using a portmanteau of the artist Salvador
> > Dalí and Pixar’s WALL·E.
> >
> > is a 12-billion parameter version of[GPT-3](
> > https://arxiv.org/abs/2005.14165) trained to generate images from text
> > descriptions, using a dataset of text–image pairs. We’ve found that it
> has
> > a diverse set of capabilities, including creating anthropomorphized
> > versions of animals and objects, combining unrelated concepts in
> plausible
> > ways, rendering text, and applying transformations to existing images.
> >
> > ---------------------------------------------------------------
> >
> > Text prompt
> > an illustration of a baby daikon radish in a tutu walking a dog
> > AI-generated
> > images
> >
> > Edit prompt or view more images
> > Text prompt
> > an armchair in the shape of an avocado. . . .
> > AI-generated
> > images
> >
> > Edit prompt or view more images
> > Text prompt
> > a store front that has the word ‘openai’ written on it. . . .
> > AI-generated
> > images
> >
> > Edit prompt or view more images
> > Text & image
> > prompt
> > the exact same cat on the top as a sketch on the bottom
> > AI-generated
> > images
> >
> > Edit prompt or view more images
> > ---------------------------------------------------------------
> >
> > GPT-3 showed that language can be used to instruct a large neural network
> > to perform a variety of text generation tasks. [Image GPT](
> > https://openai.com/blog/image-gpt) showed that the same type of neural
> > network can also be used to generate images with high fidelity. We extend
> > these findings to show that manipulating visual concepts through language
> > is now within reach.
> >
> > Overview
> >
> > Like GPT-3, DALL·E is a transformer language model. It receives both the
> > text and the image as a single stream of data containing up to 1280
> tokens,
> > and is trained using maximum likelihood to generate all of the tokens,
> one
> > after another.[2](https://openai.com/blog/dall-e/#fn2)
> >
> > A token is any symbol from a discrete vocabulary; for humans, each
> English
> > letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has
> tokens
> > for both text and image concepts. Specifically, each image caption is
> > represented using a maximum of 256 BPE-encoded tokens with a vocabulary
> > size of 16384, and the image is represented using 1024 tokens with a
> > vocabulary size of 8192.
> >
> > The images are preprocessed to 256x256 resolution during training.
> Similar
> > to VQVAE,[14](
> >
> https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15
> )
> > each image is compressed to a 32x32 grid of discrete latent codes using a
> > discrete VAE[10](
> >
> https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11
> )
> > that we pretrained using a continuous relaxation.[12](
> >
> https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13
> )
> > We found that training using the relaxation obviates the need for an
> > explicit codebook, EMA loss, or tricks like dead code revival, and can
> > scale up to large vocabulary sizes.
> >
> > This training procedure allows DALL·E to not only generate an image from
> > scratch, but also to regenerate any rectangular region of an existing
> image
> > that extends to the bottom-right corner, in a way that is consistent with
> > the text prompt.
> >
> > We recognize that work involving generative models has the potential for
> > significant, broad societal impacts. In the future, we plan to analyze
> how
> > models like DALL·E relate to societal issues like economic impact on
> > certain work processes and professions, the potential for bias in the
> model
> > outputs, and the longer term ethical challenges implied by this
> technology.
> >
> > Capabilities
> >
> > We find that DALL·E is able to create plausible images for a great
> variety
> > of sentences that explore the compositional structure of language. We
> > illustrate this using a series of interactive visuals in the next
> section.
> > The samples shown for each caption in the visuals are obtained by taking
> > the top 32 of 512 after reranking with [CLIP](
> > https://openai.com/blog/clip/), but we do not use any manual
> > cherry-picking, aside from the thumbnails and standalone images that
> appear
> > outside.[3](https://openai.com/blog/dall-e/#fn3)
> >
> > Further details provided in [a later section](
> > https://openai.com/blog/dall-e/#summary).
> >
> > Controlling Attributes
> >
> > We test DALL·E’s ability to modify several of an object’s attributes, as
> > well as the number of times that it appears.
> >
> > Click to edit text prompt or view more AI-generated images
> > a pentagonal green clock. a green clock in the shape of a pentagon.
> >
> > navigatedownwide
> > a cube made of porcupine. a cube with the texture of a porcupine.
> >
> > navigatedownwide
> > a collection of glasses is sitting on a table
> >
> > navigatedownwide
> >
> > Drawing Multiple Objects
> >
> > Simultaneously controlling multiple objects, their attributes, and their
> > spatial relationships presents a new challenge. For example, consider the
> > phrase “a hedgehog wearing a red hat, yellow gloves, blue shirt, and
> green
> > pants.” To correctly interpret this sentence, DALL·E must not only
> > correctly compose each piece of apparel with the animal, but also form
> the
> > associations (hat, red), (gloves, yellow), (shirt, blue), and (pants,
> > green) without mixing them up.[4](https://openai.com/blog/dall-e/#fn4)
> >
> > This task is called variable binding, and has been extensively studied in
> > the literature.[17](
> >
> https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20
> > )
> >
> > We test DALL·E’s ability to do this for relative positioning, stacking
> > objects, and controlling multiple attributes.
> >
> > a small red block sitting on a large green block
> >
> > navigatedownwide
> > a stack of 3 cubes. a red cube is on the top, sitting on a green cube.
> the
> > green cube is in the middle, sitting on a blue cube. the blue cube is on
> > the bottom.
> >
> > navigatedownwide
> > an emoji of a baby penguin wearing a blue hat, red gloves, green shirt,
> > and yellow pants
> >
> > navigatedownwide
> >
> > While DALL·E does offer some level of controllability over the attributes
> > and positions of a small number of objects, the success rate can depend
> on
> > how the caption is phrased. As more objects are introduced, DALL·E is
> prone
> > to confusing the associations between the objects and their colors, and
> the
> > success rate decreases sharply. We also note that DALL·E is brittle with
> > respect to rephrasing of the caption in these scenarios: alternative,
> > semantically equivalent captions often yield no correct interpretations.
> >
> > Visualizing Perspective and Three-Dimensionality
> >
> > We find that DALL·E also allows for control over the viewpoint of a scene
> > and the 3D style in which a scene is rendered.
> >
> > an extreme close-up view of a capybara sitting in a field
> >
> > navigatedownwide
> > a capybara made of voxels sitting in a field
> >
> > navigatedownwide
> >
> > To push this further, we test DALL·E’s ability to repeatedly draw the
> head
> > of a well-known figure at each angle from a sequence of equally spaced
> > angles, and find that we can recover a smooth animation of the rotating
> > head.
> >
> > a photograph of a bust of homer
> >
> > navigatedownwide
> >
> > DALL·E appears to be able to apply some types of optical distortions to
> > scenes, as we see with the options “fisheye lens view” and “a spherical
> > panorama.” This motivated us to explore its ability to generate
> reflections.
> >
> > a plain white cube looking at its own reflection in a mirror. a plain
> > white cube gazing at itself in a mirror.
> >
> > navigatedownwide
> >
> > Visualizing Internal and External Structure
> >
> > The samples from the “extreme close-up view” and “x-ray” style led us to
> > further explore DALL·E’s ability to render internal structure with
> > cross-sectional views, and external structure with macro photographs.
> >
> > a cross-section view of a walnut
> >
> > navigatedownwide
> > a macro photograph of brain coral
> >
> > navigatedownwide
> >
> > Inferring Contextual Details
> >
> > The task of translating text to images is underspecified: a single
> caption
> > generally corresponds to an infinitude of plausible images, so the image
> is
> > not uniquely determined. For instance, consider the caption “a painting
> of
> > a capybara sitting on a field at sunrise.” Depending on the orientation
> of
> > the capybara, it may be necessary to draw a shadow, though this detail is
> > never mentioned explicitly. We explore DALL·E’s ability to resolve
> > underspecification in three cases: changing style, setting, and time;
> > drawing the same object in a variety of different situations; and
> > generating an image of an object with specific text written on it.
> >
> > a painting of a capybara sitting in a field at sunrise
> >
> > navigatedownwide
> > a stained glass window with an image of a blue strawberry
> >
> > navigatedownwide
> > a store front that has the word ‘openai’ written on it. a store front
> that
> > has the word ‘openai’ written on it. a store front that has the word
> > ‘openai’ written on it. ‘openai’ store front.
> >
> > navigatedownwide
> >
> > With varying degrees of reliability, DALL·E provides access to a subset
> of
> > the capabilities of a 3D rendering engine via natural language. It can
> > independently control the attributes of a small number of objects, and
> to a
> > limited extent, how many there are, and how they are arranged with
> respect
> > to one another. It can also control the location and angle from which a
> > scene is rendered, and can generate known objects in compliance with
> > precise specifications of angle and lighting conditions.
> >
> > Unlike a 3D rendering engine, whose inputs must be specified
> unambiguously
> > and in complete detail, DALL·E is often able to “fill in the blanks” when
> > the caption implies that the image must contain a certain detail that is
> > not explicitly stated.
> >
> > Applications of Preceding Capabilities
> >
> > Next, we explore the use of the preceding capabilities for fashion and
> > interior design.
> >
> > a male mannequin dressed in an orange and black flannel shirt
> >
> > navigatedownwide
> > a female mannequin dressed in a black leather jacket and gold pleated
> skirt
> >
> > navigatedownwide
> > a living room with two white armchairs and a painting of the colosseum.
> > the painting is mounted above a modern fireplace.
> >
> > navigatedownwide
> > a loft bedroom with a white bed next to a nightstand. there is a fish
> tank
> > beside the bed.
> >
> > navigatedownwide
> >
> > Combining Unrelated Concepts
> >
> > The compositional nature of language allows us to put together concepts
> to
> > describe both real and imaginary things. We find that DALL·E also has the
> > ability to combine disparate ideas to synthesize objects, some of which
> are
> > unlikely to exist in the real world. We explore this ability in two
> > instances: transferring qualities from various concepts to animals, and
> > designing products by taking inspiration from unrelated concepts.
> >
> > a snail made of harp. a snail with the texture of a harp.
> >
> > navigatedownwide
> > an armchair in the shape of an avocado. an armchair imitating an avocado.
> >
> > navigatedownwide
> >
> > Animal Illustrations
> >
> > In the previous section, we explored DALL·E’s ability to combine
> unrelated
> > concepts when generating images of real-world objects. Here, we explore
> > this ability in the context of art, for three kinds of illustrations:
> > anthropomorphized versions of animals and objects, animal chimeras, and
> > emojis.
> >
> > an illustration of a baby daikon radish in a tutu walking a dog
> >
> > navigatedownwide
> > a professional high quality illustration of a giraffe turtle chimera. a
> > giraffe imitating a turtle. a giraffe made of turtle.
> >
> > navigatedownwide
> > a professional high quality emoji of a lovestruck cup of boba
> >
> > navigatedownwide
> >
> > Zero-Shot Visual Reasoning
> >
> > GPT-3 can be instructed to perform many kinds of tasks solely from a
> > description and a cue to generate the answer supplied in its prompt,
> > without any additional training. For example, when prompted with the
> phrase
> > “here is the sentence ‘a person walking his dog in the park’ translated
> > into French:”, GPT-3 answers “un homme qui promène son chien dans le
> parc.”
> > This capability is called zero-shot reasoning. We find that DALL·E
> extends
> > this capability to the visual domain, and is able to perform several
> kinds
> > of image-to-image translation tasks when prompted in the right way.
> >
> > the exact same cat on the top as a sketch on the bottom
> >
> > navigatedownwide
> > the exact same teapot on the top with ’gpt’ written on it on the bottom
> >
> > navigatedownwide
> >
> > We did not anticipate that this capability would emerge, and made no
> > modifications to the neural network or training procedure to encourage
> it.
> > Motivated by these results, we measure DALL·E’s aptitude for analogical
> > reasoning problems by testing it on Raven’s progressive matrices, a
> visual
> > IQ test that saw widespread use in the 20th century.
> >
> > a sequence of geometric shapes.
> >
> > navigatedownwide
> >
> > Geographic Knowledge
> >
> > We find that DALL·E has learned about geographic facts, landmarks, and
> > neighborhoods. Its knowledge of these concepts is surprisingly precise in
> > some ways and flawed in others.
> >
> > a photo of the food of china
> >
> > navigatedownwide
> > a photo of alamo square, san francisco, from a street at night
> >
> > navigatedownwide
> > a photo of san francisco’s golden gate bridge
> >
> > navigatedownwide
> >
> > Temporal Knowledge
> >
> > In addition to exploring DALL·E’s knowledge of concepts that vary over
> > space, we also explore its knowledge of concepts that vary over time.
> >
> > a photo of a phone from the 20s
> >
> > navigatedownwide
> >
> > Summary of Approach and Prior Work
> >
> > DALL·E is a simple decoder-only transformer that receives both the text
> > and the image as a single stream of 1280 tokens—256 for the text and 1024
> > for the image—and models all of them autoregressively. The attention mask
> > at each of its 64 self-attention layers allows each image token to attend
> > to all text tokens. DALL·E uses the standard causal mask for the text
> > tokens, and sparse attention for the image tokens with either a row,
> > column, or convolutional attention pattern, depending on the layer. We
> > provide more details about the architecture and training procedure in our
> > [paper](https://arxiv.org/abs/2102.12092).
> >
> > Text-to-image synthesis has been an active area of research since the
> > pioneering work of Reed et. al,[1](https://openai.com/blog/dall-e/#rf1)
> > whose approach uses a GAN conditioned on text embeddings. The embeddings
> > are produced by an encoder pretrained using a contrastive loss, not
> unlike
> > CLIP. StackGAN[3](https://openai.com/blog/dall-e/#rf3) and
> StackGAN++[4](
> > https://openai.com/blog/dall-e/#rf4) use multi-scale GANs to scale up
> the
> > image resolution and improve visual fidelity. AttnGAN[5](
> > https://openai.com/blog/dall-e/#rf5) incorporates attention between the
> > text and image features, and proposes a contrastive text-image feature
> > matching loss as an auxiliary objective. This is interesting to compare
> to
> > our reranking with CLIP, which is done offline. Other work[2](
> >
> https://openai.com/blog/dall-e/#rf2)[6](https://openai.com/blog/dall-e/#rf6)[7](https://openai.com/blog/dall-e/#rf7
> )
> > incorporates additional sources of supervision during training to improve
> > image quality. Finally, work by Nguyen et. al[8](
> > https://openai.com/blog/dall-e/#rf8) and Cho et. al[9](
> > https://openai.com/blog/dall-e/#rf9) explores sampling-based strategies
> > for image generation that leverage pretrained multimodal discriminative
> > models.
> >
> > Similar to the rejection sampling used in [VQVAE-2](
> > https://arxiv.org/abs/1906.00446), we use [CLIP](
> > https://openai.com/blog/clip/) to rerank the top 32 of 512 samples for
> > each caption in all of the interactive visuals. This procedure can also
> be
> > seen as a kind of language-guided search[16](
> > https://openai.com/blog/dall-e/#rf16), and can have a dramatic impact on
> > sample quality.
> >
> > an illustration of a baby daikon radish in a tutu walking a dog [caption
> > 1, best 8 of 2048]
> >
> >
> >
> navigatedownwide---------------------------------------------------------------
> >
> > Footnotes
> >
> > -
> >
> > We decided to name our model using a portmanteau of the artist Salvador
> > Dalí and Pixar’s WALL·E. [↩︎](https://openai.com/blog/dall-e/#fnref1)
> >
> > -
> >
> > A token is any symbol from a discrete vocabulary; for humans, each
> English
> > letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has
> tokens
> > for both text and image concepts. Specifically, each image caption is
> > represented using a maximum of 256 BPE-encoded tokens with a vocabulary
> > size of 16384, and the image is represented using 1024 tokens with a
> > vocabulary size of 8192.
> >
> > The images are preprocessed to 256x256 resolution during training.
> Similar
> > to VQVAE,[14](
> >
> https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15
> )
> > each image is compressed to a 32x32 grid of discrete latent codes using a
> > discrete VAE[10](
> >
> https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11
> )
> > that we pretrained using a continuous relaxation.[12](
> >
> https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13
> )
> > We found that training using the relaxation obviates the need for an
> > explicit codebook, EMA loss, or tricks like dead code revival, and can
> > scale up to large vocabulary sizes. [↩︎](
> > https://openai.com/blog/dall-e/#fnref2)
> >
> > -
> >
> > Further details provided in [a later section](
> > https://openai.com/blog/dall-e/#summary). [↩︎](
> > https://openai.com/blog/dall-e/#fnref3)
> >
> > -
> >
> > This task is called variable binding, and has been extensively studied in
> > the literature.[17](
> >
> https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20
> )
> > [↩︎](https://openai.com/blog/dall-e/#fnref4)
> >
> > ---------------------------------------------------------------
> >
> > References
> >
> > - Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.
> > (2016). “[Generative adversarial text to image synthesis](
> > https://arxiv.org/abs/1605.05396)”. In ICML 2016. [↩︎](
> > https://openai.com/blog/dall-e/#rfref1)
> >
> > - Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H. (2016).
> > “[Learning what and where to draw](https://arxiv.org/abs/1610.02454)”.
> In
> > NIPS 2016. [↩︎](https://openai.com/blog/dall-e/#rfref2)
> >
> > - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang X., Metaxas, D.
> > (2016). “[StackGAN: Text to photo-realistic image synthesis with stacked
> > generative adversarial networks](https://arxiv.org/abs/1612.03242)”. In
> > ICCY 2017. [↩︎](https://openai.com/blog/dall-e/#rfref3)
> >
> > - Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.
> > (2017). “[StackGAN++: realistic image synthesis with stacked generative
> > adversarial networks](https://arxiv.org/abs/1710.10916)”. In IEEE TPAMI
> > 2018. [↩︎](https://openai.com/blog/dall-e/#rfref4)
> >
> > - Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.
> > (2017). “[AttnGAN: Fine-grained text to image generation with attentional
> > generative adversarial networks](https://arxiv.org/abs/1711.10485).
> [↩︎](
> > https://openai.com/blog/dall-e/#rfref5)
> >
> > - Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., Gao, J.
> > (2019). “[Object-driven text-to-image synthesis via adversarial
> training](
> > https://arxiv.org/abs/1902.10740)”. In CVPR 2019. [↩︎](
> > https://openai.com/blog/dall-e/#rfref6)
> >
> > - Koh, J. Y., Baldridge, J., Lee, H., Yang, Y. (2020). “[Text-to-image
> > generation grounded by fine-grained user attention](
> > https://arxiv.org/abs/2011.03775)”. In WACV 2021. [↩︎](
> > https://openai.com/blog/dall-e/#rfref7)
> >
> > - Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.
> (2016).
> > “[Plug & play generative networks: conditional iterative generation of
> > images in latent space](https://arxiv.org/abs/1612.00005). [↩︎](
> > https://openai.com/blog/dall-e/#rfref8)
> >
> > - Cho, J., Lu, J., Schwen, D., Hajishirzi, H., Kembhavi, A. (2020).
> > “[X-LXMERT: Paint, caption, and answer questions with multi-modal
> > transformers](https://arxiv.org/abs/2009.11278)”. EMNLP 2020. [↩︎](
> > https://openai.com/blog/dall-e/#rfref9)
> >
> > - Kingma, Diederik P., and Max Welling. “[Auto-encoding variational
> bayes](
> > https://arxiv.org/abs/1312.6114).” arXiv preprint (2013). [↩︎](
> > https://openai.com/blog/dall-e/#rfref10a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref10b)
> >
> > - Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra.
> “[Stochastic
> > backpropagation and approximate inference in deep generative models](
> > https://arxiv.org/abs/1401.4082).” arXiv preprint (2014). [↩︎](
> > https://openai.com/blog/dall-e/#rfref11a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref11b)
> >
> > - Jang, E., Gu, S., Poole, B. (2016). “[Categorical reparametrization
> with
> > Gumbel-softmax](https://arxiv.org/abs/1611.01144)”. [↩︎](
> > https://openai.com/blog/dall-e/#rfref12a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref12b)
> >
> > - Maddison, C., Mnih, A., Teh, Y. W. (2016). “[The Concrete distribution:
> > a continuous relaxation of discrete random variables](
> > https://arxiv.org/abs/1611.00712)”. [↩︎](
> > https://openai.com/blog/dall-e/#rfref13a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref13b)
> >
> > - van den Oord, A., Vinyals, O., Kavukcuoglu, K. (2017). “[Neural
> discrete
> > representation learning](https://arxiv.org/abs/1711.00937)”. [↩︎](
> > https://openai.com/blog/dall-e/#rfref14a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref14b)
> >
> > - Razavi, A., van der Oord, A., Vinyals, O. (2019). “[Generating diverse
> > high-fidelity images with VQ-VAE-2](https://arxiv.org/abs/1906.00446)”.
> > [↩︎](https://openai.com/blog/dall-e/#rfref15a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref15b)
> >
> > - Andreas, J., Klein, D., Levine, S. (2017). “[Learning with Latent
> > Language](https://arxiv.org/abs/1711.00482)”. [↩︎](
> > https://openai.com/blog/dall-e/#rfref16)
> >
> > - Smolensky, P. (1990). “[Tensor product variable binding and the
> > representation of symbolic structures in connectionist systems](
> >
> http://www.lscp.net/persons/dupoux/teaching/AT1_2014/papers/Smolensky_1990_TensorProductVariableBinding.AI.pdf)
> ”.
> > [↩︎](https://openai.com/blog/dall-e/#rfref17a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref17b)
> >
> > - Plate, T. (1995). “[Holographic reduced representations: convolution
> > algebra for compositional distributed representations](
> > https://www.ijcai.org/Proceedings/91-1/Papers/006.pdf)”. [↩︎](
> > https://openai.com/blog/dall-e/#rfref18a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref18b)
> >
> > - Gayler, R. (1998). “[Multiplicative binding, representation operators &
> > analogy](http://cogprints.org/502/)”. [↩︎](
> > https://openai.com/blog/dall-e/#rfref19a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref19b)
> >
> > - Kanerva, P. (1997). “[Fully distributed representations](
> > http://www.cap-lore.com/RWC97-kanerva.pdf)”. [↩︎](
> > https://openai.com/blog/dall-e/#rfref20a) [↩︎](
> > https://openai.com/blog/dall-e/#rfref20b)
> >
> > ---------------------------------------------------------------
> >
> > Authors
> > [Aditya Ramesh](https://openai.com/blog/authors/aditya/)[Mikhail
> Pavlov](
> > https://openai.com/blog/authors/mikhail/)[Gabriel Goh](
> > https://openai.com/blog/authors/gabriel/)[Scott Gray](
> > https://openai.com/blog/authors/scott/)
> > (Primary Authors)
> > [Mark Chen](https://openai.com/blog/authors/mark/)[Rewon Child](
> > https://openai.com/blog/authors/rewon/)[Vedant Misra](
> > https://openai.com/blog/authors/vedant/)[Pamela Mishkin](
> > https://openai.com/blog/authors/pamela/)[Gretchen Krueger](
> > https://openai.com/blog/authors/gretchen/)[Sandhini Agarwal](
> > https://openai.com/blog/authors/sandhini/)[Ilya Sutskever](
> > https://openai.com/blog/authors/ilya/)
> > (Supporting Authors)
> > ---------------------------------------------------------------
> >
> > Filed Under
> > [Research](
> >
> https://openai.com/blog/tags/research/)[Milestones](https://openai.com/blog/tags/milestones/)[Multimodal](https://openai.com/blog/tags/multimodal/
> > )
> > ---------------------------------------------------------------
> >
> > Cover Artwork
> >
> > Justin Jay Wang
> >
> > ---------------------------------------------------------------
> >
> > Acknowledgments
> >
> > Thanks to the following for their feedback on this work and contributions
> > to this release: Alec Radford, Andrew Mayne, Jeff Clune, Ashley
> Pilipiszyn,
> > Steve Dowling, Jong Wook Kim, Lei Pan, Heewoo Jun, John Schulman, Michael
> > Tabatowski, Preetum Nakkiran, Jack Clark, Fraser Kelton, Jacob Jackson,
> > Greg Brockman, Wojciech Zaremba, Justin Mao-Jones, David Luan, Shantanu
> > Jain, Prafulla Dhariwal, Sam Altman, Pranav Shyam, Miles Brundage, Jakub
> > Pachocki, and Ryan Lowe.
> >
> > ---------------------------------------------------------------
> >
> > Contributions
> >
> > Aditya Ramesh was the project lead: he developed the approach, trained
> the
> > models, and wrote most of the blog copy.
> >
> > Aditya Ramesh, Mikhail Pavlov, and Scott Gray worked together to scale up
> > the model to 12 billion parameters, and designed the infrastructure used
> to
> > draw samples from the model.
> >
> > Aditya Ramesh, Gabriel Goh, and Justin Jay Wang worked together to create
> > the interactive visuals for the blog.
> >
> > Mark Chen and Aditya Ramesh created the images for Raven’s Progressives
> > Matrices.
> >
> > Rewon Child and Vedant Misra assisted in writing the blog.
> >
> > Pamela Mishkin, Gretchen Krueger, and Sandhini Agarwal advised on broader
> > impacts of the work and assisted in writing the blog.
> >
> > Ilya Sutskever oversaw the project and assisted in writing the blog.
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: not available
> > Type: text/html
> > Size: 45019 bytes
> > Desc: not available
> > URL: <
> >
> https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/96a4e98c/attachment.txt
> > >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > cypherpunks mailing list
> > cypherpunks at lists.cpunks.org
> > https://lists.cpunks.org/mailman/listinfo/cypherpunks
> >
> >
> > ------------------------------
> >
> > End of cypherpunks Digest, Vol 106, Issue 93
> > ********************************************
> >
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 39458 bytes
> Desc: not available
> URL: <
> https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/66867ca0/attachment.txt
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> cypherpunks mailing list
> cypherpunks at lists.cpunks.org
> https://lists.cpunks.org/mailman/listinfo/cypherpunks
>
>
> ------------------------------
>
> End of cypherpunks Digest, Vol 106, Issue 94
> ********************************************
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 46836 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220408/eea82fed/attachment.txt>


More information about the cypherpunks mailing list