Two days ago while using my phone I found a trick I could that worked with a discord machine learning bot to guide its images better. The trick no longer works as well after the settings were changed, but can be used elsewhere. First I did image->text, then used that for text->image, and tweaked things until its output was stable. The problem with image generation with CLIP+VQGAN is that it doesn't understand broad layout of objects, so the approach helps a lot. Once it was stable I could replace words to replace concepts in the output. I was using an image of a face. Here are images I made. A male cyborg dryad: [1]https://cdn.discordapp.com/attachments/838682121975234571/8796221389 80601856/1629788417_a_cyborg_tree_with_short_hair.jpg A group of zombies and robots sitting around fires in a forest: [2]https://cdn.discordapp.com/attachments/838682121975234571/8796286202 49837568/1629789944_a_group_of_zombies_robots_and_dryads_sitting_around _a_campfire_in_the_middle_of_a_forest.jpg References 1. https://cdn.discordapp.com/attachments/838682121975234571/879622138980601856/1629788417_a_cyborg_tree_with_short_hair.jpg 2. https://cdn.discordapp.com/attachments/838682121975234571/879628620249837568/1629789944_a_group_of_zombies_robots_and_dryads_sitting_around_a_campfire_in_the_middle_of_a_forest.jpg