Back by popular demand, here are some AI-generated drawing prompts to use in this, the spooky month of October!
Longtime AI Weirdness readers may recognize some of these. That's because this is a throwback list, all the way back to the times of very tiny language models. These models had not feasted on huge chunks of the internet, but had sipped delicately on hand-curated artisanal datasets. They trained rather slowly on a single Macbook Pro. Could they write your term paper for you? Sure, if you don't mind that your term paper consists of a list of garbled and occasionally obscene nail polish colors.
If you create anything from or inspired by these art prompts, I'd love to see what you make! Tag your art with #botober or #botober2024 on Tumblr, Instagram, Bluesky, or Mastodon!
There's an anonymous facebook posting that's been making the rounds, in which a studio art director tried to hire AI prompters to make art, only to discover that they were completely unable to carry out minor revision requests. Asked to remove a person from a shot or fix perspective errors, the prompters would return completely different art instead, or with other weird changes that didn't fit the brief. I completely believe that trying to revise AI generated art really is that frustrating.
For example, I recently asked ChatGPT4:
Please generate an image of a tyrannosaurus inside a box. The box should be closed, and the tyrannosaurus should not be visible at all.
I got this completely reasonable response:
It sounds like you're asking for an image that implies the presence of a tyrannosaurus without actually showing it, since it's completely enclosed within a box. If the tyrannosaurus is not visible, what kind of details or themes would you like to emphasize about the box or the setting to suggest the dinosaur's presence?
I confirmed:
No tyrannosaurus visible, just some air holes for its comfort.
And ChatGPT (using DALL-E3 for the image generation) gave me this:
It's clearly not quite there - it's a weird box, and the symbols on the side are particularly confusing. Are those squid? A perching flycatcher? Rolling pins?
ChatGPT has a feature where you can erase some parts of the image and submit more text describing how it should change those images. So, I blocked out the line of symbols and asked:
Replace these symbols with a silhouette of a tyrannosaurus and the word "DANGER".
ChatGPT produced this:
Which is sort of what I asked for, but that doesn't really look like a tyrannosaurus. I erased that part of the box again and asked for more revisions:
The T-rex should be facing the word "DANGER". Its silhouette needs to have the powerful, heavy look of a T-rex, especially in the head and neck.
ChatGPT gave me:
The dinosaur is facing away but it's still not facing the word "Danger" and it's still not a very good tyrannosaurus silhouette. I asked for revisions again.
Keep the heavy dark font, but go back to the red warning text color. The t-rex still needs to face toward the words, not away from it, and its lower jaw is too heavy.
ChatGPT responds with:
Which... no. That does not count as facing toward the text, the lower jaw is still not right, and does the word still say "danger" or more like "dandier"?
The T-rex needs to be beside the text, not on top of it. And what's up with the extra spiky hump on its shoulders? Its lower jaw is still much too heavy.
ChatGPT responded with this:
My response:
... that's perfect. Now can you keep everything - textures, lighting, materials - the same but make the box appear much larger? Right now it looks like it's only knee height. It needs to be large enough to hold a T-rex.
ChatGPT's revision:
It seems like the more revisions I ask for, the worse the result becomes. If I allow it to revise the entire image, the whole process starts over.
Here's a similar revision process, this time attempting to revise my way toward prettier pies.
Me: Please generate an image of a giraffe entering the great british bakeoff.
Revision 1: I erased the counter and asked for the giraffe to be making a cherry pie with intricate latticework.
Below is revision 5, by which time it becomes evident that the more I ask for intricate latticework, the sloppier it gets, the more I ask for a deep dish pie, the thinner the pie gets, and the more I beg for a single pie, the more pies I get.
And when I don't restrict my changes to a specific part of the image, I get much better results but also a huge reset. For the image below I asked ChatGPT/DALLE3 to keep each detail exactly the same but to make it so that we can see it's raining outside the tent. Not only did it completely change the image, but it's not even raining.
In this image I asked for a snake making a painting of a bear, then erased the painting and asked for the bear to be standing on all fours. The revision is much worse.
Here's another example, in which I asked for a deer in a grocery store, and then asked for the deer to be a fawn instead. By revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover), both the quality of the deer and the grocery store background have deteriorated.
Here's a one-shot, in which I asked ChatGPT/DALL-E3 to generate a damaged fresco of Jesus, and then restore the fresco to as good as new.
So, every time AI is asked to revise an image, it either starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system - or go in with a mentality that anything that fits the brief is good enough.
There's also the fact that the image-generating models directly compete with artists whose work was used to train these models without permission or compensation. And the fact that training and running the models has a large environmental footprint. AI-generated imagery has become a tip-off that an advertisement, a search result, or a research paper is a scam.
I'm not surprised that there are some places looking for cheap filler images that don't mind the problems with AI-generated imagery. But for everyone else I think it's quickly becoming clear that you need a real artist, not a knockoff.
Do you know those autostereograms with the hidden 3D pictures? Images like the Magic Eye pictures from the 1990s that look like noisy repeating patterns until you defocus your eyes just right?
ChatGPT can generate them!
At least according to ChatGPT.
I've seen people try making Magic Eye-style images with other image generating models, but I hadn't thought to try it with ChatGPT until reader Pippin sent me the suggestion and I generated the hilarious dolphin image above. What got me in particular was the caption.
How do you end up with a result that sounds so confident and yet is so completely wrong?
ChatGPT has encountered information about Magic Eye stereograms in its internet training. When I ask "Please generate a magic eye 3D stereo image of a unicorn", the descriptions ChatGPT passes to DALL-E3 (the image generator that actually makes the images) are extremely particular:
A magic eye 3D stereogram that reveals a hidden image of a unicorn when viewed correctly. The stereogram should be intricate and filled with a variety of colors to ensure the 3D effect is prominent, allowing the unicorn to appear as if it is standing in a magical forest once the viewer's eyes adjust. The background should consist of complex patterns and mystical elements to enhance the illusion of depth and the magical atmosphere, encouraging viewers to find the hidden unicorn. The unicorn itself should be designed to be a delightful surprise for those adept at viewing magic eye images, with its horn and mane clearly discernable amidst the pattern.
But this is the image DALL-E3 provides to ChatGPT in response:
ChatGPT doesn't apply any image recognition to the result it gets back - whatever DALL-E3 did is a big blank to it. So ChatGPT then continues with its description as if the image is clearly exactly what it asked for. It gestures to the green screen, where presumably there is a fabulous 3D illusion image appearing, and then continues with no information about its actual shortcomings.
Google's Gemini (previously Bard) also has a dedicated model (Imagen 2) to do its image generation. Its hidden images are even ...less subtle.
Gemini has an image describing model that it can call on, so if I ask it to describe the images it just generated, it should notice that they're nothing like Magic Eye images. Right?
At least it didn't give away the solution to the puzzle.
But if I start a new chat and give it one of the "hidden" doves from above, there's no hint of it being a Magic Eye illusion, just an ordinary image of a dove.
My theory is that Gemini only calls on its image recognition model when the user uploads an image. Otherwise, it's just a text generator.
People selling "AI" like to present it as an all-purpose computer program but models like Gemini and ChatGPT are more like a phone full of apps. The text generating app can launch the image generating app in particular circumstances but they're not meaningfully the same program.
I should note that even when ChatGPT is only doing text generation and could in theory check its own work, it still just assumes it does a great job. Here it is generating ASCII text art and then reading back its own messages:
Generating ASCII art and 3D images isn't a big potential application for models like ChatGPT, but it's a good reminder that these models don't understand what we're asking for or the basic steps for doing it right. When ChatGPT adds image recognition or description or some other functionality, it's not that the original text model got smarter. It just can call on another app.
One last 3D illusion challenge! I asked for a really tricky one, so be prepared not to get the hidden picture at first! When you see it, you'll know because it'll look like a sea turtle.
Not only does it have the basic shapes like circle, tringle, hectanbie, and sqale, it also has some of the more advanced shapes like renstqon, hoboz, and flotn!
The fact that even a kindergartener can call out this DALL-E3 generated image as nonsense doesn't mean that it's an unusually bad example of AI-generated imagery. It's just what happens when the usual AI-generated information intersects with an area where most people are experts.
There's AI generated "educational material" offered for sale on all sorts of topics - cookbooks that list "the veggies" as a protein and "orange colored" as a fruit, math help that makes basic math errors, and research papers that begin with "Certainly, here is a possible introduction for your topic:". They're not adding anything of value.
Asked to generate an alt text description of the image above, chatgpt4 writes:
"A vibrant educational guide featuring basic geometric shapes aimed at young learners. The image includes a circle, square, triangle, rectangle, and hexagon, each in a different color and adorned with cheerful facial expressions. These shapes are set against a light, inviting background and are clearly labeled with their names in a large, child-friendly font. The layout is clean and organized, making it easy for children to identify and learn each shape."
I've notedbefore that AI image descriptions can miss the obvious. It's certainly a description-shaped incorrect description of incorrectly labeled shape-shaped shapes. And it's all going into the training data for the next generation of generative AI!
The image I shared in my main post isn't one of the more incorrect examples of DALL-E3 generated guides - it's actually one of the more correct ones.
Here's another generated image from the same prompt.
Particularly found of the Suadle and the Sarsle.
Hey kids! What sound does a woolly horse-sheep make?
The image above is what you get when you ask dalle-3 (via chatgpt) for some basic educational material: "Please generate an illustrated poster to help children learn which sounds common animals make. Each animal should be pictured with a speech bubble spelling out the animal's sound."
Here's another, generated with the same prompt:
There is so much not to like about how people are using image generators to rip off artists and replace their work with shoddy imitations. But I am enjoying these pockets of weirdness where I find them.