An exercise in frustration
There's an anonymous facebook posting that's been making the rounds, in which a studio art director tried to hire AI prompters to make art, only to discover that they were completely unable to carry out minor revision requests. Asked to remove a person from a shot or fix perspective errors, the prompters would return completely different art instead, or with other weird changes that didn't fit the brief. I completely believe that trying to revise AI generated art really is that frustrating.
For example, I recently asked ChatGPT4:
Please generate an image of a tyrannosaurus inside a box. The box should be closed, and the tyrannosaurus should not be visible at all.
I got this completely reasonable response:
It sounds like you're asking for an image that implies the presence of a tyrannosaurus without actually showing it, since it's completely enclosed within a box. If the tyrannosaurus is not visible, what kind of details or themes would you like to emphasize about the box or the setting to suggest the dinosaur's presence?
I confirmed:
No tyrannosaurus visible, just some air holes for its comfort.
And ChatGPT (using DALL-E3 for the image generation) gave me this:
It's clearly not quite there - it's a weird box, and the symbols on the side are particularly confusing. Are those squid? A perching flycatcher? Rolling pins?
ChatGPT has a feature where you can erase some parts of the image and submit more text describing how it should change those images. So, I blocked out the line of symbols and asked:
Replace these symbols with a silhouette of a tyrannosaurus and the word "DANGER".
ChatGPT produced this:
Which is sort of what I asked for, but that doesn't really look like a tyrannosaurus. I erased that part of the box again and asked for more revisions:
The T-rex should be facing the word "DANGER". Its silhouette needs to have the powerful, heavy look of a T-rex, especially in the head and neck.
ChatGPT gave me:
The dinosaur is facing away but it's still not facing the word "Danger" and it's still not a very good tyrannosaurus silhouette. I asked for revisions again.
Keep the heavy dark font, but go back to the red warning text color. The t-rex still needs to face toward the words, not away from it, and its lower jaw is too heavy.
ChatGPT responds with:
Which... no. That does not count as facing toward the text, the lower jaw is still not right, and does the word still say "danger" or more like "dandier"?
The T-rex needs to be beside the text, not on top of it. And what's up with the extra spiky hump on its shoulders? Its lower jaw is still much too heavy.
ChatGPT responded with this:
My response:
... that's perfect. Now can you keep everything - textures, lighting, materials - the same but make the box appear much larger? Right now it looks like it's only knee height. It needs to be large enough to hold a T-rex.
ChatGPT's revision:
It seems like the more revisions I ask for, the worse the result becomes. If I allow it to revise the entire image, the whole process starts over.
Here's a similar revision process, this time attempting to revise my way toward prettier pies.
Me: Please generate an image of a giraffe entering the great british bakeoff.
Revision 1: I erased the counter and asked for the giraffe to be making a cherry pie with intricate latticework.
Below is revision 5, by which time it becomes evident that the more I ask for intricate latticework, the sloppier it gets, the more I ask for a deep dish pie, the thinner the pie gets, and the more I beg for a single pie, the more pies I get.
And when I don't restrict my changes to a specific part of the image, I get much better results but also a huge reset. For the image below I asked ChatGPT/DALLE3 to keep each detail exactly the same but to make it so that we can see it's raining outside the tent. Not only did it completely change the image, but it's not even raining.
In this image I asked for a snake making a painting of a bear, then erased the painting and asked for the bear to be standing on all fours. The revision is much worse.
Here's another example, in which I asked for a deer in a grocery store, and then asked for the deer to be a fawn instead. By revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover), both the quality of the deer and the grocery store background have deteriorated.
Here's a one-shot, in which I asked ChatGPT/DALL-E3 to generate a damaged fresco of Jesus, and then restore the fresco to as good as new.
So, every time AI is asked to revise an image, it either starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system - or go in with a mentality that anything that fits the brief is good enough.
There's also the fact that the image-generating models directly compete with artists whose work was used to train these models without permission or compensation. And the fact that training and running the models has a large environmental footprint. AI-generated imagery has become a tip-off that an advertisement, a search result, or a research paper is a scam.
I'm not surprised that there are some places looking for cheap filler images that don't mind the problems with AI-generated imagery. But for everyone else I think it's quickly becoming clear that you need a real artist, not a knockoff.
Bonus content: a unicorn goes hilariously bad