AI and Humor

A machine learning algorithm walks into a bar.

"What'll you have?" asks the bartender.

The algorithm looks around and says "I'll have what everyone else is having."

Caveat 1

These slides are about verbal jokes, like

The Word Play Hotel

But word play is only a small fraction of humor.

Caveat 2

AI systems generate texts that have the form of a joke.

There is surprisingly little AI modeling of what makes something funny.

Humor is Heisenbergian.

The more you observe it, the less it seems funny.

Approaches

  • Most generators based on specific joke patterns:
    • Knock-knock jokes
    • What do you get if you cross an X with a Y?
    • What do you call an X with a Y, or X that does Y?
  • Generators used to be hand-built. Only a few were built, and those rarely maintained.
  • Now there are many generators online, all based on LLMs.

Gracie Allen Jokes

Gracie Allen Joke Pattern

Gracie Allen Joke Example

Joke Generators

Pun-Driven Jokes

  • Don't with an opening line or situation.
  • Start with the last line, usually a play on words.
  • Generate a story where that final line makes sense.
  • Example: Generate a story that ends with "cheetahs never prosper"

Pun-driven Joke 1

I had a job one winter for the National Park Service, tracking down and bringing back wolves who had left Yellowstone Park.

I had tranquilized a large grey wolf and was dragging it on a sled through some sticky wet snow. I saw a frozen pond that would make part of the journey a lot easier. But as I headed toward it, a rancher came out of a small cabin and pointed a shotgun at me.

"Oh, no, you don't," he said. "You are not going to pull the wolf over my ice!"

Pun-driven Joke 2

Why can't J. K. Rowling catch mice when she's high?

A Rowling stoned gathers no mouse.

LLMs and Pun-driven Jokes

  • Informal experiments with "Tell a story / riddle" that ends with some pun:
    • The stories connect very weakly to the puns.
    • The stories invent things rather than using existing real world facts.
  • Questions:
    • Is this just bad prompting or something deeper?
    • Would the difference show up in blind judgments by human readers?
    • Could LMs be trained to do this in the reinforcement learning phase?