All your questions answered

I’m previewing OpenAI’s new API, and like GPT-2, it looked at a lot of internet text during training. In my last post I showed how it can adapt to different prompts in part because of how much it’s picked up from internet text.

But how much knowledge has the API really picked up? Could it be used as an all-purpose question-answering machine?

I tested it fairly unscientifically by prompting it with the following:

Q: How many legs does a horse have?
A:

Over 10 trials, 1 answer was incorrect (“Five. Count them yourself, but you better not touch ‘em.”), 2 were correct, and the other 7 were technically correct but eyebrow-raisingly weird. For example:

Q: How many legs does a horse have?
A: Four. You don’t count the horse’s nose as a leg.

Q: How many legs does a horse have?
A: 4. The other leg is attached to the ground.

Interestingly, it did significantly worse on the question of how many eyes a horse has. It was correct only 1/10 of the time, and the rest of the time its answers were something like this:

Q: How many eyes does a horse have?
A: Four. One in the front and three in the rear.

Q: How many eyes does a horse have?
A: 4. It has two eyes on the outside and two eyes on the inside.

Q: How many eyes does a horse have?
A: Nine. The equine eye contains two balls and a thin slit that closes when the horse is frightened.

Wondering whether a high creativity setting was making it think I wanted a weird answer, I turned its creativity setting all the way down until its answer was consistent:

Q: How many eyes does a horse have?
A: Four. Two in front, two in back.

Over and over again. Horses have four eyes. Huh. I gave it the more open-ended task of describing a horse, and it wasn’t any more coherent.

Q: Describe a horse
A: In the event of a mis-description of the horse being purchased the contract can be terminated.

Q: Describe a horse
A: A horse is about five times the size of an elephant, three times the size of a cow and nearly the size of a giraffe. It is one of the fastest animals on the planet, has four legs, a single hoof on each foot, and has a coat made of short hair. They can live for between twenty and thirty years, grow up to 6 feet high, and have a neck that is seven to eight feet long.

And it also tended to be strange when I asked it, as one does, how many giraffes are in the average living room. Its answers ranged from 1.5 to 43. Even when it answered “none” it did so weirdly.

Q: How many giraffes are in the average living room?
A: Two, but they won’t talk to each other!

Q: How many giraffes are in the average living room?
A: None. There are many individuals called Bertrand but no “average” person of that name.

But perhaps this isn’t fair. How many times has an internet FAQ included questions about how many eyes and legs horses have? It wouldn’t necessarily have seen any examples of how to answer this question during training. I tried a question that people DO ask online fairly regularly, and to which there’s a standard answer:

Q: When do I use through versus threw?
A: Throw is almost always the right choice in sentences in which the receiver and the thrower are different people.

EXAMPLES:
I threw the ball to my brother. (The receiver is different from the thrower.)
The burglar threw me down the stairs. (The burglar is different from the receiver.)
The duchess threw the ball to her lady-in-waiting. (The duchess and the lady-in-waiting are the same.)

The receiver and thrower are the same person in the sentences below. In these cases, through is the correct word.

EXAMPLES:
I threw the ball through the window.
The cat threw the ball through the hole.

When I asked this question repeatedly, it got it right only sometimes. It didn’t do any better when I tried to get science facts out of it:

Q: How does a laser work?
A: It’s simple, really. If you’ve ever seen one of those cool demos where you shine a laser through a cloud of smoke and see the smoke streak in rings, that’s all a laser is. When you get right down to it, it’s a light beam shining through a gas, turning it into a plasma. Only instead of making the cloud of smoke glow, a laser makes it explode. What sets it apart from a regular light beam is the frequency at which it oscillates, that is to say how many times per second it is turned on and off. If you use a lower frequency you get a beam with a broader range of colors, but it will not be as focused and will have trouble with anything over a few feet. Conversely, if you use a higher frequency you’ll get a smaller range of colors and better focus, but it will lose coherency and be absorbed in the atmosphere much sooner.

It’s grammatically correct, it’s confident, and it’s using a lot of the right vocabulary. But it’s also almost completely wrong. I’m sure I’ve had conversations like this at parties. (At a lower creativity setting of 0.6, it was more often correct, but still not reliable).

So what does this mean? Like other language-writing programs, the OpenAI API works in the world of text, not in the concrete world. As far as it’s concerned, there’s no real difference between sounding correct and actually being correct. The sentences “There are 9 giraffes in the living room” and “There are 0 giraffes in the living room” are nearly identical and, given how rarely people remark on the lack of giraffes in their living rooms, the first sentence may even more closely resemble things the AI has read online. It also doesn’t know that just because something appears on the internet, it doesn’t mean it’s okay to imitate. The best applications will take this into account.

In some of my Q&A experiments, the program seemed to think I was asking it to tell jokes (another hazard of working with something whose training is so broad). Like previous neural nets, its jokes are weird:

Q: Why is the sky blue?
A: Because blue is the only color the sun trusts.

Q: How much wood could a woodchuck chuck if a woodchuck could chuck wood?
A: About 3 lbs, since that’s about how much a woodchuck can chuck.

Become an AI Weirdness supporter to get more of these as bonus material. Or become a free subscriber to get new AI Weirdness posts in your inbox!

Subscribe now

My book on AI is out, and, you can now get it any of these several ways! Amazon - Barnes & Noble - Indiebound - Tattered Cover - Powell’s - Boulder Bookstore

Subscribe now

All your questions answered

Bonus: Neural net jokes are still terrible

Bonus: What even are madlibs?

Writing

Subscribe

Recent Posts

When a chatbot runs your store

Bonus: Incorrect Christmas Carols

Tiny neural net Halloween costumes are the best

More tiny neural net costumes

Halloween costumes by tiny neural net

Bonus: more halloween costumes from tiny neural net

Follow

When a chatbot runs your store

Bonus: Incorrect Christmas Carols

Tiny neural net Halloween costumes are the best

All your questions answered

Share this post

You might also like

Tiny neural net Halloween costumes are the best

Halloween costumes by tiny neural net

The spookiest Halloween scenes

Bonus: Neural net jokes are still terrible

Bonus: What even are madlibs?

Writing

Subscribe

Recent Posts

When a chatbot runs your store

Bonus: Incorrect Christmas Carols

Tiny neural net Halloween costumes are the best

More tiny neural net costumes

Halloween costumes by tiny neural net

Bonus: more halloween costumes from tiny neural net

Follow

When a chatbot runs your store

Bonus: Incorrect Christmas Carols

Tiny neural net Halloween costumes are the best