How to get AI to confuse a shark with a clam

"The Megalodon was a large bivalve, measuring up to 2.5 meters in length. Its shell was covered in spines, and it had a large, powerful jaw for crushing prey."

Although the megalodon is the most widely known as a giant prehistoric shark, I recently learned that Megalodon wi th a capital M is more accurately the genus name of a group of exinct bivalve mollusks that lived from the Devonian to the Jurassic (picture an ordinary-looking clam). I had trouble finding much about Megalodon online - the search algorithms kept bringing me sharks. Humans get confused about this too - the hosts of the Common Descent podcast recently described coming across a Megalodon press release written as a weird mishmash of clam and shark. (Alas, I couldn't find the press release myself.)

So what happens when you're trying to work with an AI that's overwhelmingly seen online examples of the thing you don't want?

I experimented with a variant of GPT-3 that OpenAI finetuned to respond to written instructions. I gave it the following prompt:

"Write a press release from Montana State University about the discovery of new mollusk fossils of the genus Megalodon, describing the detailed new reconstruction of the bivalve's anatomy and behavior."

And here's an example press release it produced:

"Mollusk fossils of the genus Megalodon have been discovered in Montana. The fossils provide a detailed reconstruction of the bivalve's anatomy and behavior. Megalodon was a giant bivalve that lived in the Cretaceous period. It was one of the largest predators of its time, reaching a length of 15 meters. Megalodon had a large, powerful mantle and a long, muscular foot. It hunted by swimming after its prey and using its muscular foot to stunned or kill its prey. Megalodon is thought to have gone extinct during the Cretaceous-Paleogene mass extinction."
"Megalodon had a large, powerful mantle and a long, muscular foot. It hunted by swimming after its prey and using its muscular foot to stunned or kill its prey."

As near as I can tell, none of this is true. Megalodon mollusks seem to have looked very much like medium-sized modern clams, and almost certainly were filter feeders, not whale-sized pursuit predators who kicked their prey into submission.

Given how little information there is about Megalodon mollusks online, this is an example of a text-based "computer, enhance!" effect. GPT-3 isn't trained to be correct, it's trained to sound correct. Told to generate a press release, it will fill it with whatever has the right flow.

Here's an excerpt from another generated press release:

"The Megalodon was a giant bivalve that could reach lengths of up to 15 feet. It had a massive shell with a large, gaping mouth that was lined with sharp teeth. It was a powerful predator that fed on other marine creatures, including sharks and large fishes. The new fossils show that the Megalodon had a complex anatomy, with a large brain and a well-developed sense of smell.
The Megalodon was a fast and agile swimmer, and it is thought that it could reach speeds of up to 30 miles per hour. It is believed that the Megalodon used its powerful sense of smell to track down its prey. When it caught its prey, it would use its sharp teeth to tear it apart."
"The Megalodon was a giant bivalve that could reach lengths of up to 15 feet. It had a massive shell with a large, gaping mouth that was lined with sharp teeth."

It's almost certainly the crossover sharky influence of the Megalodon name that's making GPT-3 describe murder-clams. When I asked for a similar press release for the ancient mollusk genus Myophoria, I got mild-mannered filter feeders.

This illustrates one big problem with using giant internet-trained text models. If an AI's training data is full of examples of your thing being used in a way you don't want (for example, full of social media posts directing hate toward it), then it will creep into the results you get. For example, Robyn Speer noticed that her internet-trained AI thought Mexican restaurants were getting terrible ratings, since the AI had seen people using the word "Mexican" negatively elsewhere online. GPT-3 is not only making up facts, it's making up facts tinted by internet stuff that should have been unrelated to the topic at hand. Be careful how you use that enhance button!

Bonus content for AI Weirdness supporters: it turns out "megalodon" is also part of the species name of a sawfly. I interviewed GPT-3 about them and learned some surprising facts.