An exercise in frustration

There's an anonymous facebook posting that's been making the rounds, in which a studio art director tried to hire AI prompters to make art, only to discover that they were completely unable to carry out minor revision requests. Asked to remove a person from a shot or fix perspective errors, the prompters would return completely different art instead, or with other weird changes that didn't fit the brief. I completely believe that trying to revise AI generated art really is that frustrating.

For example, I recently asked ChatGPT4:

Please generate an image of a tyrannosaurus inside a box. The box should be closed, and the tyrannosaurus should not be visible at all.

I got this completely reasonable response:

It sounds like you're asking for an image that implies the presence of a tyrannosaurus without actually showing it, since it's completely enclosed within a box. If the tyrannosaurus is not visible, what kind of details or themes would you like to emphasize about the box or the setting to suggest the dinosaur's presence?

I confirmed:

No tyrannosaurus visible, just some air holes for its comfort.

And ChatGPT (using DALL-E3 for the image generation) gave me this:

A generated image of a cardboard(ish) box with two large air holes on its side. It appears to be about knee height, and its top is taped shut with weirdly stretchy duct tape. Along the bottom edge are a set of six boxed symbols, none of which are identifiable.

It's clearly not quite there - it's a weird box, and the symbols on the side are particularly confusing. Are those squid? A perching flycatcher? Rolling pins?

ChatGPT has a feature where you can erase some parts of the image and submit more text describing how it should change those images. So, I blocked out the line of symbols and asked:

Replace these symbols with a silhouette of a tyrannosaurus and the word "DANGER".

ChatGPT produced this:

The line of symbols has been replaced by the word Danger in red all-caps. Beside the words, facing away from them, is a skinny-necked two-legged dinosaur that might be some kind of dromeosaur.

Which is sort of what I asked for, but that doesn't really look like a tyrannosaurus. I erased that part of the box again and asked for more revisions:

The T-rex should be facing the word "DANGER". Its silhouette needs to have the powerful, heavy look of a T-rex, especially in the head and neck.

ChatGPT gave me:

The dinosaur is facing the other direction, but the word "Danger" is still behind it. Both dinosaur and words are in all black now, though there's a weird white background behind the dinosaur's feet. The dinosaur looks approximately like a tyrannosaurus, except its lower jaw is huge, toothless, and clublike.

The dinosaur is facing away but it's still not facing the word "Danger" and it's still not a very good tyrannosaurus silhouette. I asked for revisions again.

Keep the heavy dark font, but go back to the red warning text color. The t-rex still needs to face toward the words, not away from it, and its lower jaw is too heavy.

ChatGPT responds with:

The dinosaur silhouette is now superimposed over the red letters. The word starts with Dan and ends with R, but there is room for one or two extra letters in the part of the word that's hidden by the dinosaur. The dinosaur's silhouette looks even more toothless and goofy.

Which... no. That does not count as facing toward the text, the lower jaw is still not right, and does the word still say "danger" or more like "dandier"?

The T-rex needs to be beside the text, not on top of it. And what's up with the extra spiky hump on its shoulders? Its lower jaw is still much too heavy.

ChatGPT responded with this:

The words Danger are now superimposed in red over the dinosaur silhouette. At least it's allegedly a dinosaur silhouette - other than the tail, all the other details are melting and dripping into a gloopy blob.

My response:

... that's perfect. Now can you keep everything - textures, lighting, materials - the same but make the box appear much larger? Right now it looks like it's only knee height. It needs to be large enough to hold a T-rex.

ChatGPT's revision:

The crate is much taller, but also much different in construction and materials and the room has changed slightly too. On one side of the box is the word Danger in white font surrounded by classic red circle and black rectangle. At the bottom of the box is the word "Tyrannosauris" (misspelled). Between the top and bottom labels is a tyrannosaurus silhouette with much, much better detail and realism than any of the silhouettes from the last several images.

It seems like the more revisions I ask for, the worse the result becomes. If I allow it to revise the entire image, the whole process starts over.

Here's a similar revision process, this time attempting to revise my way toward prettier pies.

Me: Please generate an image of a giraffe entering the great british bakeoff.

the tent is huge and the bakers are really crowded in toward the back - also they're all white - but the giraffe is wearing an apron and a chef's hat and doing something indistinct with flour and circles of dough.

Revision 1: I erased the counter and asked for the giraffe to be making a cherry pie with intricate latticework.

now there is a latticed cherry pie on the bench but the lattice is huge and uneven.

Below is revision 5, by which time it becomes evident that the more I ask for intricate latticework, the sloppier it gets, the more I ask for a deep dish pie, the thinner the pie gets, and the more I beg for a single pie, the more pies I get.

There are now four entirely flat, bright red pies on the bench, and the lattice of each one is far worse than the original. Also the giraffe has a creepy human hand.

And when I don't restrict my changes to a specific part of the image, I get much better results but also a huge reset. For the image below I asked ChatGPT/DALLE3 to keep each detail exactly the same but to make it so that we can see it's raining outside the tent. Not only did it completely change the image, but it's not even raining.

The view of the tent is vaguely similar but the style and angle are completely different, and now it's a closeup of the pie, which is nice and deep with a neatly braided lattice. The tent does indeed show the outside now, but rather than raining it's a clear starry night. Also all the bakers are not only white, but elderly. — My prompt: "Perfect. Please keep the image exactly as it is, except make it so that we can see that it is raining hard outside the tent."

In this image I asked for a snake making a painting of a bear, then erased the painting and asked for the bear to be standing on all fours. The revision is much worse.

Image description: On the left, a rainbow-colored stake is coiled in font of an easel. (The snake's coil splits in two so I guess it's a fork-tailed snake). On the easel is a painting of a seated bear. It's fairly realistic. On the right, the bear has been replaced by a smaller, much worse, replica of the snake. There is no bear.

Here's another example, in which I asked for a deer in a grocery store, and then asked for the deer to be a fawn instead. By revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover), both the quality of the deer and the grocery store background have deteriorated.

Image description: In the original image on the left, the deer looks a bit like a plastic mannequin, but fits lighting and shading wise with the rest of the grocery store scene. In the revised image on the right, the deer is now a fawn with strange flat shading, superimposed on the scene rather than standing in it. Its eyes are strangely liquid and 3D, but the rest of the fawn is staring straight on at the camera with no apparent depth. Where the grocery store aisle background has been revised, the shelves have lost all detail and resemblance to a grocery store, and the ceiling light fixture now descends into the floor.

Here's a one-shot, in which I asked ChatGPT/DALL-E3 to generate a damaged fresco of Jesus, and then restore the fresco to as good as new.

Image description: Image on the left appears to be a damaged fresco of Jesus (with telltale long fingers that are connected to both hands). Image in the middle is the same except for Jesus's face and halo, which no longer appear damaged. However, Jesus's face is now weirdly smooth and shiny and doesn't fit with the rest of the painting any more, and his eyes are weird blue and black spirals. Image on the right has Jesus's face in a completely different, almost airbrushed, style, and his eyes don't remotely match any more.

So, every time AI is asked to revise an image, it either starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system - or go in with a mentality that anything that fits the brief is good enough.

There's also the fact that the image-generating models directly compete with artists whose work was used to train these models without permission or compensation. And the fact that training and running the models has a large environmental footprint. AI-generated imagery has become a tip-off that an advertisement, a search result, or a research paper is a scam.

I'm not surprised that there are some places looking for cheap filler images that don't mind the problems with AI-generated imagery. But for everyone else I think it's quickly becoming clear that you need a real artist, not a knockoff.

Bonus content: a unicorn goes hilariously bad

Subscribe now

An exercise in frustration

Bonus: A unicorn goes downhill

Bonus: "Ignore all previous instructions" gets weirder

Writing

Subscribe

Recent Posts

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024

Bonus: "Ignore all previous instructions" gets weirder

An exercise in frustration

Bonus: A unicorn goes downhill

Follow

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024

An exercise in frustration

Share this post

You might also like

Hidden 3D Pictures

Shaped like information

Learn your farm animals with AI!

Bonus: A unicorn goes downhill

Bonus: "Ignore all previous instructions" gets weirder

Writing

Subscribe

Recent Posts

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024

Bonus: "Ignore all previous instructions" gets weirder

An exercise in frustration

Bonus: A unicorn goes downhill

Follow

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024