Re: cypherpunks Digest, Vol 106, Issue 95

9 Apr 2022

      Oh yes he did, he did do it.

Gunnar Larson raped him. He had to ...

On Fri, Apr 8, 2022, 8:32 PM <cypherpunks-request@lists.cpunks.org> wrote:
...
Send cypherpunks mailing list submissions to
        cypherpunks@lists.cpunks.org
To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.cpunks.org/mailman/listinfo/cypherpunks
or, via email, send a message with subject or body 'help' to
        cypherpunks-request@lists.cpunks.org
You can reach the person managing the list at
        cypherpunks-owner@lists.cpunks.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of cypherpunks digest..."
Today's Topics:
1. Re: cypherpunks Digest, Vol 106, Issue 94 (Gunnar Larson)
----------------------------------------------------------------------
Message: 1
Date: Fri, 8 Apr 2022 20:29:43 -0400
From: Gunnar Larson <g@xny.io>
To: cypherpunks <cypherpunks@lists.cpunks.org>
Subject: Re: cypherpunks Digest, Vol 106, Issue 94
Message-ID:
        <
CAPc8xwO4+uLbaR52tWvdyRckPtLWO49uxSHk-boT0HwUkMmUVw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Did Gunnar Larson, rape Mr. Mark Zuckerburg? Or, was it fair game?
Finders keepers?
On Fri, Apr 8, 2022, 7:56 PM <cypherpunks-request@lists.cpunks.org> wrote:
...
Send cypherpunks mailing list submissions to
        cypherpunks@lists.cpunks.org
To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.cpunks.org/mailman/listinfo/cypherpunks
or, via email, send a message with subject or body 'help' to
        cypherpunks-request@lists.cpunks.org
You can reach the person managing the list at
        cypherpunks-owner@lists.cpunks.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of cypherpunks digest..."
Today's Topics:
1. Re: cypherpunks Digest, Vol 106, Issue 93 (Gunnar Larson)
----------------------------------------------------------------------
Message: 1
Date: Fri, 8 Apr 2022 19:54:15 -0400
From: Gunnar Larson <g@xny.io>
To: cypherpunks <cypherpunks@lists.cpunks.org>
Subject: Re: cypherpunks Digest, Vol 106, Issue 93
Message-ID:
        <CAPc8xwPsCK2cA3tT1U-wjuV09T5kc=
TBMcrzLz46uyNHJXV9cg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
At first glance, this was a great article.
On Fri, Apr 8, 2022, 7:52 PM <cypherpunks-request@lists.cpunks.org>
wrote:
...
Send cypherpunks mailing list submissions to
        cypherpunks@lists.cpunks.org
To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.cpunks.org/mailman/listinfo/cypherpunks
or, via email, send a message with subject or body 'help' to
        cypherpunks-request@lists.cpunks.org
You can reach the person managing the list at
        cypherpunks-owner@lists.cpunks.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of cypherpunks digest..."
Today's Topics:
1. Re: DALL-E (coderman)
----------------------------------------------------------------------
Message: 1
Date: Fri, 08 Apr 2022 23:50:53 +0000
From: coderman <coderman@protonmail.com>
To: coderman <coderman@protonmail.com>
Cc: "cy\"Cypherpunks" <cypherpunks@cpunks.org>
Subject: Re: DALL-E
Message-ID:
...
...
protonmail.com>
Content-Type: text/plain; charset="utf-8"
DALL·E[1](https://openai.com/blog/dall-e/#fn1)
We decided to name our model using a portmanteau of the artist Salvador
Dalí and Pixar’s WALL·E.
is a 12-billion parameter version of[GPT-3](
https://arxiv.org/abs/2005.14165) trained to generate images from text
descriptions, using a dataset of text–image pairs. We’ve found that it
has
a diverse set of capabilities, including creating anthropomorphized
versions of animals and objects, combining unrelated concepts in
plausible
ways, rendering text, and applying transformations to existing images.
---------------------------------------------------------------
Text prompt
an illustration of a baby daikon radish in a tutu walking a dog
AI-generated
images
Edit prompt or view more images
Text prompt
an armchair in the shape of an avocado. . . .
AI-generated
images
Edit prompt or view more images
Text prompt
a store front that has the word ‘openai’ written on it. . . .
AI-generated
images
Edit prompt or view more images
Text & image
prompt
the exact same cat on the top as a sketch on the bottom
AI-generated
images
Edit prompt or view more images
---------------------------------------------------------------
GPT-3 showed that language can be used to instruct a large neural
network
to perform a variety of text generation tasks. [Image GPT](
https://openai.com/blog/image-gpt) showed that the same type of neural
network can also be used to generate images with high fidelity. We
extend
these findings to show that manipulating visual concepts through
language
is now within reach.
Overview
Like GPT-3, DALL·E is a transformer language model. It receives both
<a9WeFGpr9g422W0Uym9aQZyxT6mqWNzNwLsG6yKqqlD4BLpH6NxuARXLOMvBY8IdZF9HMetBKZYGjdH--qJRFZDIWnXdMRQVqr3pmMYVo5I=@
the
...
...
text and the image as a single stream of data containing up to 1280
tokens,
and is trained using maximum likelihood to generate all of the tokens,
one
after another.[2](https://openai.com/blog/dall-e/#fn2)
A token is any symbol from a discrete vocabulary; for humans, each
English
letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has
tokens
for both text and image concepts. Specifically, each image caption is
represented using a maximum of 256 BPE-encoded tokens with a vocabulary
size of 16384, and the image is represented using 1024 tokens with a
vocabulary size of 8192.
The images are preprocessed to 256x256 resolution during training.
Similar
to VQVAE,[14](
https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15
...
)
...
each image is compressed to a 32x32 grid of discrete latent codes
using a
discrete VAE[10](
https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11
...
)
...
that we pretrained using a continuous relaxation.[12](
...
...
We found that training using the relaxation obviates the need for an
explicit codebook, EMA loss, or tricks like dead code revival, and can
scale up to large vocabulary sizes.
This training procedure allows DALL·E to not only generate an image
from
scratch, but also to regenerate any rectangular region of an existing
image
that extends to the bottom-right corner, in a way that is consistent
with
the text prompt.
We recognize that work involving generative models has the potential
for
significant, broad societal impacts. In the future, we plan to analyze
how
models like DALL·E relate to societal issues like economic impact on
certain work processes and professions, the potential for bias in the
model
outputs, and the longer term ethical challenges implied by this
technology.
Capabilities
We find that DALL·E is able to create plausible images for a great
variety
of sentences that explore the compositional structure of language. We
illustrate this using a series of interactive visuals in the next
)
section.
...
The samples shown for each caption in the visuals are obtained by
taking
the top 32 of 512 after reranking with [CLIP](
https://openai.com/blog/clip/), but we do not use any manual
cherry-picking, aside from the thumbnails and standalone images that
appear
outside.[3](https://openai.com/blog/dall-e/#fn3)
Further details provided in [a later section](
https://openai.com/blog/dall-e/#summary).
Controlling Attributes
We test DALL·E’s ability to modify several of an object’s attributes,
as
well as the number of times that it appears.
Click to edit text prompt or view more AI-generated images
a pentagonal green clock. a green clock in the shape of a pentagon.
navigatedownwide
a cube made of porcupine. a cube with the texture of a porcupine.
navigatedownwide
a collection of glasses is sitting on a table
navigatedownwide
Drawing Multiple Objects
Simultaneously controlling multiple objects, their attributes, and
...
...
spatial relationships presents a new challenge. For example, consider
https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13
their
the
...
...
phrase “a hedgehog wearing a red hat, yellow gloves, blue shirt, and
green
pants.” To correctly interpret this sentence, DALL·E must not only
correctly compose each piece of apparel with the animal, but also form
the
associations (hat, red), (gloves, yellow), (shirt, blue), and (pants,
green) without mixing them up.[4](https://openai.com/blog/dall-e/#fn4)
This task is called variable binding, and has been extensively studied
in
the literature.[17](
...
...
)
We test DALL·E’s ability to do this for relative positioning, stacking
objects, and controlling multiple attributes.
a small red block sitting on a large green block
navigatedownwide
a stack of 3 cubes. a red cube is on the top, sitting on a green cube.
the
green cube is in the middle, sitting on a blue cube. the blue cube is
on
the bottom.
navigatedownwide
an emoji of a baby penguin wearing a blue hat, red gloves, green shirt,
and yellow pants
navigatedownwide
While DALL·E does offer some level of controllability over the
attributes
and positions of a small number of objects, the success rate can depend
on
how the caption is phrased. As more objects are introduced, DALL·E is
prone
to confusing the associations between the objects and their colors, and
the
success rate decreases sharply. We also note that DALL·E is brittle
with
respect to rephrasing of the caption in these scenarios: alternative,
semantically equivalent captions often yield no correct
interpretations.
Visualizing Perspective and Three-Dimensionality
We find that DALL·E also allows for control over the viewpoint of a
scene
and the 3D style in which a scene is rendered.
an extreme close-up view of a capybara sitting in a field
navigatedownwide
a capybara made of voxels sitting in a field
navigatedownwide
To push this further, we test DALL·E’s ability to repeatedly draw the
head
of a well-known figure at each angle from a sequence of equally spaced
angles, and find that we can recover a smooth animation of the rotating
head.
a photograph of a bust of homer
navigatedownwide
DALL·E appears to be able to apply some types of optical distortions to
scenes, as we see with the options “fisheye lens view” and “a spherical
panorama.” This motivated us to explore its ability to generate
reflections.
a plain white cube looking at its own reflection in a mirror. a plain
white cube gazing at itself in a mirror.
navigatedownwide
Visualizing Internal and External Structure
The samples from the “extreme close-up view” and “x-ray” style led us
to
further explore DALL·E’s ability to render internal structure with
cross-sectional views, and external structure with macro photographs.
a cross-section view of a walnut
navigatedownwide
a macro photograph of brain coral
navigatedownwide
Inferring Contextual Details
The task of translating text to images is underspecified: a single
caption
generally corresponds to an infinitude of plausible images, so the
image
is
not uniquely determined. For instance, consider the caption “a painting
of
a capybara sitting on a field at sunrise.” Depending on the orientation
of
the capybara, it may be necessary to draw a shadow, though this detail
is
never mentioned explicitly. We explore DALL·E’s ability to resolve
underspecification in three cases: changing style, setting, and time;
drawing the same object in a variety of different situations; and
generating an image of an object with specific text written on it.
a painting of a capybara sitting in a field at sunrise
navigatedownwide
a stained glass window with an image of a blue strawberry
navigatedownwide
a store front that has the word ‘openai’ written on it. a store front
that
has the word ‘openai’ written on it. a store front that has the word
‘openai’ written on it. ‘openai’ store front.
navigatedownwide
With varying degrees of reliability, DALL·E provides access to a subset
of
the capabilities of a 3D rendering engine via natural language. It can
independently control the attributes of a small number of objects, and
to a
limited extent, how many there are, and how they are arranged with
respect
to one another. It can also control the location and angle from which a
scene is rendered, and can generate known objects in compliance with
precise specifications of angle and lighting conditions.
Unlike a 3D rendering engine, whose inputs must be specified
unambiguously
and in complete detail, DALL·E is often able to “fill in the blanks”
when
the caption implies that the image must contain a certain detail that
is
not explicitly stated.
Applications of Preceding Capabilities
Next, we explore the use of the preceding capabilities for fashion and
interior design.
a male mannequin dressed in an orange and black flannel shirt
navigatedownwide
a female mannequin dressed in a black leather jacket and gold pleated
skirt
navigatedownwide
a living room with two white armchairs and a painting of the colosseum.
the painting is mounted above a modern fireplace.
navigatedownwide
a loft bedroom with a white bed next to a nightstand. there is a fish
tank
beside the bed.
navigatedownwide
Combining Unrelated Concepts
The compositional nature of language allows us to put together concepts
to
describe both real and imaginary things. We find that DALL·E also has
...
...
ability to combine disparate ideas to synthesize objects, some of which
are
unlikely to exist in the real world. We explore this ability in two
instances: transferring qualities from various concepts to animals, and
designing products by taking inspiration from unrelated concepts.
a snail made of harp. a snail with the texture of a harp.
navigatedownwide
an armchair in the shape of an avocado. an armchair imitating an
avocado.
navigatedownwide
Animal Illustrations
In the previous section, we explored DALL·E’s ability to combine
unrelated
concepts when generating images of real-world objects. Here, we explore
this ability in the context of art, for three kinds of illustrations:
anthropomorphized versions of animals and objects, animal chimeras, and
emojis.
an illustration of a baby daikon radish in a tutu walking a dog
navigatedownwide
a professional high quality illustration of a giraffe turtle chimera. a
giraffe imitating a turtle. a giraffe made of turtle.
navigatedownwide
a professional high quality emoji of a lovestruck cup of boba
navigatedownwide
Zero-Shot Visual Reasoning
GPT-3 can be instructed to perform many kinds of tasks solely from a
description and a cue to generate the answer supplied in its prompt,
without any additional training. For example, when prompted with the
phrase
“here is the sentence ‘a person walking his dog in the park’ translated
into French:”, GPT-3 answers “un homme qui promène son chien dans le
parc.”
This capability is called zero-shot reasoning. We find that DALL·E
extends
this capability to the visual domain, and is able to perform several
kinds
of image-to-image translation tasks when prompted in the right way.
the exact same cat on the top as a sketch on the bottom
navigatedownwide
the exact same teapot on the top with ’gpt’ written on it on the bottom
navigatedownwide
We did not anticipate that this capability would emerge, and made no
modifications to the neural network or training procedure to encourage
it.
Motivated by these results, we measure DALL·E’s aptitude for analogical
reasoning problems by testing it on Raven’s progressive matrices, a
visual
IQ test that saw widespread use in the 20th century.
a sequence of geometric shapes.
navigatedownwide
Geographic Knowledge
We find that DALL·E has learned about geographic facts, landmarks, and
neighborhoods. Its knowledge of these concepts is surprisingly precise
in
some ways and flawed in others.
a photo of the food of china
navigatedownwide
a photo of alamo square, san francisco, from a street at night
navigatedownwide
a photo of san francisco’s golden gate bridge
navigatedownwide
Temporal Knowledge
In addition to exploring DALL·E’s knowledge of concepts that vary over
space, we also explore its knowledge of concepts that vary over time.
a photo of a phone from the 20s
navigatedownwide
Summary of Approach and Prior Work
DALL·E is a simple decoder-only transformer that receives both the text
and the image as a single stream of 1280 tokens—256 for the text and
1024
for the image—and models all of them autoregressively. The attention
mask
at each of its 64 self-attention layers allows each image token to
attend
to all text tokens. DALL·E uses the standard causal mask for the text
tokens, and sparse attention for the image tokens with either a row,
column, or convolutional attention pattern, depending on the layer. We
provide more details about the architecture and training procedure in
our
[paper](https://arxiv.org/abs/2102.12092).
Text-to-image synthesis has been an active area of research since the
pioneering work of Reed et. al,[1](https://openai.com/blog/dall-e/#rf1
)
whose approach uses a GAN conditioned on text embeddings. The
embeddings
are produced by an encoder pretrained using a contrastive loss, not
unlike
CLIP. StackGAN[3](https://openai.com/blog/dall-e/#rf3) and
StackGAN++[4](
https://openai.com/blog/dall-e/#rf4) use multi-scale GANs to scale up
the
image resolution and improve visual fidelity. AttnGAN[5](
https://openai.com/blog/dall-e/#rf5) incorporates attention between
https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20
the
the
...
...
text and image features, and proposes a contrastive text-image feature
matching loss as an auxiliary objective. This is interesting to compare
to
our reranking with CLIP, which is done offline. Other work[2](
https://openai.com/blog/dall-e/#rf2)[6](https://openai.com/blog/dall-e/#rf6)[7](https://openai.com/blog/dall-e/#rf7
...
)
...
incorporates additional sources of supervision during training to
improve
image quality. Finally, work by Nguyen et. al[8](
https://openai.com/blog/dall-e/#rf8) and Cho et. al[9](
https://openai.com/blog/dall-e/#rf9) explores sampling-based
strategies
for image generation that leverage pretrained multimodal discriminative
models.
Similar to the rejection sampling used in [VQVAE-2](
https://arxiv.org/abs/1906.00446), we use [CLIP](
https://openai.com/blog/clip/) to rerank the top 32 of 512 samples for
each caption in all of the interactive visuals. This procedure can also
be
seen as a kind of language-guided search[16](
https://openai.com/blog/dall-e/#rf16), and can have a dramatic impact
on
sample quality.
an illustration of a baby daikon radish in a tutu walking a dog
[caption
1, best 8 of 2048]
navigatedownwide---------------------------------------------------------------
...
...
Footnotes
-
We decided to name our model using a portmanteau of the artist Salvador
Dalí and Pixar’s WALL·E. [↩︎](https://openai.com/blog/dall-e/#fnref1)
-
A token is any symbol from a discrete vocabulary; for humans, each
English
...
letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has
tokens
for both text and image concepts. Specifically, each image caption is
represented using a maximum of 256 BPE-encoded tokens with a vocabulary
size of 16384, and the image is represented using 1024 tokens with a
vocabulary size of 8192.
The images are preprocessed to 256x256 resolution during training.
Similar
to VQVAE,[14](
https://openai.com/blog/dall-e/#rf14)[15](https://openai.com/blog/dall-e/#rf15
...
)
...
each image is compressed to a 32x32 grid of discrete latent codes
using a
discrete VAE[10](
https://openai.com/blog/dall-e/#rf10)[11](https://openai.com/blog/dall-e/#rf11
...
)
...
that we pretrained using a continuous relaxation.[12](
https://openai.com/blog/dall-e/#rf12)[13](https://openai.com/blog/dall-e/#rf13
...
)
...
We found that training using the relaxation obviates the need for an
explicit codebook, EMA loss, or tricks like dead code revival, and can
scale up to large vocabulary sizes. [↩︎](
https://openai.com/blog/dall-e/#fnref2)
-
Further details provided in [a later section](
https://openai.com/blog/dall-e/#summary). [↩︎](
https://openai.com/blog/dall-e/#fnref3)
-
This task is called variable binding, and has been extensively studied
in
the literature.[17](
...
...
[↩︎](https://openai.com/blog/dall-e/#fnref4)
---------------------------------------------------------------
References
- Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.
(2016). “[Generative adversarial text to image synthesis](
https://arxiv.org/abs/1605.05396)”. In ICML 2016. [↩︎](
https://openai.com/blog/dall-e/#rfref1)
- Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.
(2016).
“[Learning what and where to draw](https://arxiv.org/abs/1610.02454)”.
In
NIPS 2016. [↩︎](https://openai.com/blog/dall-e/#rfref2)
- Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang X., Metaxas, D.
(2016). “[StackGAN: Text to photo-realistic image synthesis with
stacked
generative adversarial networks](https://arxiv.org/abs/1612.03242)”.
In
ICCY 2017. [↩︎](https://openai.com/blog/dall-e/#rfref3)
- Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas,
D.
(2017). “[StackGAN++: realistic image synthesis with stacked generative
adversarial networks](https://arxiv.org/abs/1710.10916)”. In IEEE
TPAMI
2018. [↩︎](https://openai.com/blog/dall-e/#rfref4)
- Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.
(2017). “[AttnGAN: Fine-grained text to image generation with
attentional
generative adversarial networks](https://arxiv.org/abs/1711.10485).
[↩︎](
https://openai.com/blog/dall-e/#rfref5)
- Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., Gao, J.
(2019). “[Object-driven text-to-image synthesis via adversarial
)
training](
...
https://arxiv.org/abs/1902.10740)”. In CVPR 2019. [↩︎](
https://openai.com/blog/dall-e/#rfref6)
- Koh, J. Y., Baldridge, J., Lee, H., Yang, Y. (2020). “[Text-to-image
generation grounded by fine-grained user attention](
https://arxiv.org/abs/2011.03775)”. In WACV 2021. [↩︎](
https://openai.com/blog/dall-e/#rfref7)
- Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.
(2016).
“[Plug & play generative networks: conditional iterative generation of
images in latent space](https://arxiv.org/abs/1612.00005). [↩︎](
https://openai.com/blog/dall-e/#rfref8)
- Cho, J., Lu, J., Schwen, D., Hajishirzi, H., Kembhavi, A. (2020).
“[X-LXMERT: Paint, caption, and answer questions with multi-modal
transformers](https://arxiv.org/abs/2009.11278)”. EMNLP 2020. [↩︎](
https://openai.com/blog/dall-e/#rfref9)
- Kingma, Diederik P., and Max Welling. “[Auto-encoding variational
bayes](
https://arxiv.org/abs/1312.6114).” arXiv preprint (2013). [↩︎](
https://openai.com/blog/dall-e/#rfref10a) [↩︎](
https://openai.com/blog/dall-e/#rfref10b)
- Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra.
“[Stochastic
backpropagation and approximate inference in deep generative models](
https://arxiv.org/abs/1401.4082).” arXiv preprint (2014). [↩︎](
https://openai.com/blog/dall-e/#rfref11a) [↩︎](
https://openai.com/blog/dall-e/#rfref11b)
- Jang, E., Gu, S., Poole, B. (2016). “[Categorical reparametrization
with
Gumbel-softmax](https://arxiv.org/abs/1611.01144)”. [↩︎](
https://openai.com/blog/dall-e/#rfref12a) [↩︎](
https://openai.com/blog/dall-e/#rfref12b)
- Maddison, C., Mnih, A., Teh, Y. W. (2016). “[The Concrete
distribution:
a continuous relaxation of discrete random variables](
https://arxiv.org/abs/1611.00712)”. [↩︎](
https://openai.com/blog/dall-e/#rfref13a) [↩︎](
https://openai.com/blog/dall-e/#rfref13b)
- van den Oord, A., Vinyals, O., Kavukcuoglu, K. (2017). “[Neural
discrete
representation learning](https://arxiv.org/abs/1711.00937)”. [↩︎](
https://openai.com/blog/dall-e/#rfref14a) [↩︎](
https://openai.com/blog/dall-e/#rfref14b)
- Razavi, A., van der Oord, A., Vinyals, O. (2019). “[Generating
https://openai.com/blog/dall-e/#rf17)[18](https://openai.com/blog/dall-e/#rf18)[19](https://openai.com/blog/dall-e/#rf19)[20](https://openai.com/blog/dall-e/#rf20
diverse
...
...
high-fidelity images with VQ-VAE-2](https://arxiv.org/abs/1906.00446)
”.
[↩︎](https://openai.com/blog/dall-e/#rfref15a) [↩︎](
https://openai.com/blog/dall-e/#rfref15b)
- Andreas, J., Klein, D., Levine, S. (2017). “[Learning with Latent
Language](https://arxiv.org/abs/1711.00482)”. [↩︎](
https://openai.com/blog/dall-e/#rfref16)
- Smolensky, P. (1990). “[Tensor product variable binding and the
representation of symbolic structures in connectionist systems](
http://www.lscp.net/persons/dupoux/teaching/AT1_2014/papers/Smolensky_1990_T...
)
...
”.
...
[↩︎](https://openai.com/blog/dall-e/#rfref17a) [↩︎](
https://openai.com/blog/dall-e/#rfref17b)
- Plate, T. (1995). “[Holographic reduced representations: convolution
algebra for compositional distributed representations](
https://www.ijcai.org/Proceedings/91-1/Papers/006.pdf)”. [↩︎](
https://openai.com/blog/dall-e/#rfref18a) [↩︎](
https://openai.com/blog/dall-e/#rfref18b)
- Gayler, R. (1998). “[Multiplicative binding, representation
operators &
analogy](http://cogprints.org/502/)”. [↩︎](
https://openai.com/blog/dall-e/#rfref19a) [↩︎](
https://openai.com/blog/dall-e/#rfref19b)
- Kanerva, P. (1997). “[Fully distributed representations](
http://www.cap-lore.com/RWC97-kanerva.pdf)”. [↩︎](
https://openai.com/blog/dall-e/#rfref20a) [↩︎](
https://openai.com/blog/dall-e/#rfref20b)
---------------------------------------------------------------
Authors
[Aditya Ramesh](https://openai.com/blog/authors/aditya/)[Mikhail
Pavlov](
https://openai.com/blog/authors/mikhail/)[Gabriel Goh](
https://openai.com/blog/authors/gabriel/)[Scott Gray](
https://openai.com/blog/authors/scott/)
(Primary Authors)
[Mark Chen](https://openai.com/blog/authors/mark/)[Rewon Child](
https://openai.com/blog/authors/rewon/)[Vedant Misra](
https://openai.com/blog/authors/vedant/)[Pamela Mishkin](
https://openai.com/blog/authors/pamela/)[Gretchen Krueger](
https://openai.com/blog/authors/gretchen/)[Sandhini Agarwal](
https://openai.com/blog/authors/sandhini/)[Ilya Sutskever](
https://openai.com/blog/authors/ilya/)
(Supporting Authors)
---------------------------------------------------------------
Filed Under
[Research](
https://openai.com/blog/tags/research/)[Milestones](https://openai.com/blog/tags/milestones/)[Multimodal](https://openai.com/blog/tags/multimodal/
...
...
)
---------------------------------------------------------------
Cover Artwork
Justin Jay Wang
---------------------------------------------------------------
Acknowledgments
Thanks to the following for their feedback on this work and
contributions
to this release: Alec Radford, Andrew Mayne, Jeff Clune, Ashley
Pilipiszyn,
Steve Dowling, Jong Wook Kim, Lei Pan, Heewoo Jun, John Schulman,
Michael
Tabatowski, Preetum Nakkiran, Jack Clark, Fraser Kelton, Jacob Jackson,
Greg Brockman, Wojciech Zaremba, Justin Mao-Jones, David Luan, Shantanu
Jain, Prafulla Dhariwal, Sam Altman, Pranav Shyam, Miles Brundage,
Jakub
Pachocki, and Ryan Lowe.
---------------------------------------------------------------
Contributions
Aditya Ramesh was the project lead: he developed the approach, trained
the
models, and wrote most of the blog copy.
Aditya Ramesh, Mikhail Pavlov, and Scott Gray worked together to scale
up
the model to 12 billion parameters, and designed the infrastructure
used
to
draw samples from the model.
Aditya Ramesh, Gabriel Goh, and Justin Jay Wang worked together to
create
the interactive visuals for the blog.
Mark Chen and Aditya Ramesh created the images for Raven’s Progressives
Matrices.
Rewon Child and Vedant Misra assisted in writing the blog.
Pamela Mishkin, Gretchen Krueger, and Sandhini Agarwal advised on
broader
impacts of the work and assisted in writing the blog.
Ilya Sutskever oversaw the project and assisted in writing the blog.

Gunnar Larson

tags

participants (1)