In search of a unicorn cake
I've generated a lot of terrible unicorn cakes by this point.
Last time I experimented with generating cakes, using CLIP's internet training to guide a couple of image generating methods. Around the time I posted my experiments, RiversHaveWings, who developed one of the most popular methods, came out with a new one called CLIP-guided diffusion. She found it dramatically improved results on some prompts, like "industrial demon, a matte painting, trending on ArtStation".
So, I decided to see if it could improve on my unicorn cakes from last time.
Using the default settings, here's "unicorn cake with golden horn and rainbow sprinkles"
I like an iridescent cake as much as the next person, but this has missed a few style points. And a CLIP-based method ought to be able to use its internet training to do better. A google image search for "unicorn cake" returns mostly cakes of a particular design and color scheme. It should know what "unicorn cake" means.
As in my previous experiments, I tried giving CLIP+diffusion a basic sketch of a plain cake to start with. I could tweak how much it was allowed to change the sketch.
This, it turns out, is too much departure.
And this is too little.
This one at least has a unicorn in it, but it's not quite got the spirit of a unicorn cake, nor any idea where the horn goes.
Starting from a light-colored cake didn't work any better.
But again, CLIP should KNOW what a unicorn cake looks like. I shouldn't even have to start it out with a sketch of a cake. Maybe my prompt was the problem. I decided to try a prompt style RiversHaveWings was having good success with.
"Unicorn cake, matte painting, trending on ArtStation"
I also tried the "food photography" and "by janelle's bakery", modifiers that seemed to work before.
These are hideous unicorn cakes, and I'm sorry. It's as if I have stumbled on a weirdly adversarial prompt for CLIP+diffusion.
I've noted before that asking for an image in just the right way can change the output from something terrible to something very aesthetically pleasing. @kingdomakrillic on imgur has put together an amazing grid of ways to modify a prompt for effect. Here are just the first three lines out of dozens.
I looked for prompts that were producing relatively coherent mushroom results.
"Unicorn cake, cryengine"
"Unicorn cake, photorealistic"
"Unicorn cake, ArtStation HD"
Then I noticed a parameter that's supposed to control how similar the output looks to the prompt text. All this time had I been telling it "like a unicorn cake but not TOO much like a unicorn cake"? I want exactly a unicorn cake.
Not knowing which direction to tweak the parameter, I reduced "CLIP_guidance_scale" from 1000 to 0.
Okay wow so that must be in the "not unicorn cake" direction. I increased "CLIP_guidance_scale" to 5000.
...huh.
In the end, nothing I tried resulted in unicorn cakes that looked anything like the instagram photos, or even as much like unicorn cakes as the CLIP+VQGAN method I tried before. Maybe the right parameter/prompt combination is out there and I just haven't found it yet. But I'm beginning to suspect that CLIP+diffusion is just really good at a certain kind of detailed, vibey, industrial prompt. So with that in mind:
"Industrial unicorn cake, matte painting, trending on artstation"
Want to try this yourself for free? Here's instructions on using both CLIP+VQGAN and CLIP+diffusion methods (no coding required).
Bonus content: a few more CLIP+diffusion images, some pretty cool looking