[imagegen][ot][wrongish] RU outdid OpenAI, Research community stepping up

k gmkarl at gmail.com
Tue Jan 4 12:50:17 PST 2022


Automated Image Generation

"Everybody" knows that current neural networks can make any
photorealistic image you want. But access to this is not universal.

OpenAI made Dall-E some time ago: https://openai.com/blog/dall-e/

Their big hallmark was an "armchair in the shape of an avocado".

If you look at their examples with some experience, it looks kinda
like they trained for a subset of what's possible, maybe to stimulate
competing research.  This means they needed fewer resources to produce
only the images they demonstrate.

That stagnated for a while.  People made public approaches such as
vqgan and diffusion, mostly (but not all) using a model called CLIP
that was released by OpenAI and has significant limitations that can
be worked around.  Here's a recent one of these developments:
https://github.com/openai/glide-text2im .  Many of these were aided by
work by Katherine Crowson with EleutherAI, and a good
community-cooperated result of them is maybe
https://github.com/pixray/pixray .

A community attempt to replicate Dall-E itself sprang up eventually,
and their work was eventually made mainstream, but wasn't very
powerful when I last looked: https://github.com/borisdayma/dalle-mini
.  It's quite inspiring to see the hard work from random peeps, and I
know they are still training their model to be better.

All of sudden, Russia comes in and releases a public model that at a
biased glance looks like somebody just threw a goldmine at it.  The
encouraged way to use it is to visit a site in russian with javascript
and captchas: https://huggingface.co/sberbank-ai/rudalle-Malevich

Meanwhile, researchers have finally gotten on board with training
networks, as can be see by the new research image model that is being
trained live right now as we speak:
https://huggingface.co/training-transformers-together/dalle-demo-v1


More information about the cypherpunks mailing list