[imagegen][ot][wrongish] RU outdid OpenAI, Research community stepping up
Automated Image Generation "Everybody" knows that current neural networks can make any photorealistic image you want. But access to this is not universal. OpenAI made Dall-E some time ago: https://openai.com/blog/dall-e/ Their big hallmark was an "armchair in the shape of an avocado". If you look at their examples with some experience, it looks kinda like they trained for a subset of what's possible, maybe to stimulate competing research. This means they needed fewer resources to produce only the images they demonstrate. That stagnated for a while. People made public approaches such as vqgan and diffusion, mostly (but not all) using a model called CLIP that was released by OpenAI and has significant limitations that can be worked around. Here's a recent one of these developments: https://github.com/openai/glide-text2im . Many of these were aided by work by Katherine Crowson with EleutherAI, and a good community-cooperated result of them is maybe https://github.com/pixray/pixray . A community attempt to replicate Dall-E itself sprang up eventually, and their work was eventually made mainstream, but wasn't very powerful when I last looked: https://github.com/borisdayma/dalle-mini . It's quite inspiring to see the hard work from random peeps, and I know they are still training their model to be better. All of sudden, Russia comes in and releases a public model that at a biased glance looks like somebody just threw a goldmine at it. The encouraged way to use it is to visit a site in russian with javascript and captchas: https://huggingface.co/sberbank-ai/rudalle-Malevich Meanwhile, researchers have finally gotten on board with training networks, as can be see by the new research image model that is being trained live right now as we speak: https://huggingface.co/training-transformers-together/dalle-demo-v1
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, January 4, 2022 8:50 PM, k <gmkarl@gmail.com> wrote:
Automated Image Generation ... All of sudden, Russia comes in and releases a public model that at a biased glance looks like somebody just threw a goldmine at it. The encouraged way to use it is to visit a site in russian with javascript and captchas: https://huggingface.co/sberbank-ai/rudalle-Malevich
""" Training the ruDALL-E neural networks on the Christofari cluster has become the largest calculation task in Russia: - ruDALL-E Kandinsky (XXL) was trained for 37 days on the 512 GPU TESLA V100, and then also for 11 more days on the 128 GPU TESLA V100, for a total of 20,352 GPU-days; - ruDALL-E Malevich (XL) was trained for 8 days on the 128 GPU TESLA V100, and then also for 15 more days on the 192 GPU TESLA V100, for a total of 3,904 GPU-days. Accordingly, training for both models totalled 24,256 GPU-days. """ you can see why complex models are the domain of big business, government, and research institutions... lots of compute required! best regards,
eh i think it's all about making demand for existing infrastructure, they could totally build models to analyse the training of other models you don't have to use the russian website for this one, the model also has a colab interface linked from its github: https://github.com/sberbank-ai/ru-dalle#minimal-example image generation nowadays has become for some (thousands of bored nerds) a way to avoid working on anything while having your computer do stuff, more effectively than ever before
On 1/4/22, k <gmkarl@gmail.com> wrote:
Automated Image Generation
"Everybody" knows that current neural networks can make any photorealistic image you want. But access to this is not universal.
https://thispersondoesnotexist.com/ https://en.wikipedia.org/wiki/StyleGAN https://github.com/NVlabs/stylegan https://nvlabs.github.io/stylegan3 https://arxiv.org/abs/2106.12423 https://github.com/NVlabs/stylegan3 The AI face generator is powered by StyleGAN, a neural network from Nvidia developed in 2018. GAN consists of 2 competing neural networks, one generates something, and the second tries to find whether results are real or generated by the first. Training ends when the first neural network begins to constantly deceive the second. StyleGAN is a generative adversarial network (GAN) introduced by Nvidia researchers in December 2018.
re: gan, I don't have a good link offhand atm, but common GANs have been usually: - image-based - pretrained to handle a single category of image - provide a random, new image among that category by default there are lists of them out there. image gan tech is still developing but produces high quality results. the gan concept can of course be applied to anything else other than an image category if the work is done. you can train a gan on your home computer.
I checked my system for updates, and there was a kernel update, but i/o started hanging while apt was updating boot, so I left it there to see if it sorts out
participants (3)
-
coderman
-
grarpamp
-
k