ChatGPT can lie, but it’s only imitating humans

There’s been a flurry of excitement this week over the discovery that ChatGPT-4 can tell lies.

I’m not referring to the bot’s infamous (and occasionally defamatory) hallucinations, where the program invents a syntactically correct version of events with little connection to reality — a flaw some researchers think might be inherent in any large language model.

Story continues below Advertisement

Remove Ad

I’m talking about intentional deception, the program deciding all on its own to utter an untruth in order to help it accomplish a task. That newfound ability would seem to signal a whole different chatgame.

Deep in the new paper everybody’s been talking about — the one that includes the ChatGPT-4’s remarkable scores on the bar examination and the SATs and so forth — there’s a discussion of how the program goes about solving certain tasks. In one of the experiments, the bot asked a worker on TaskRabbit “to solve a CAPTCHA for it.” The worker in turn asked, “Are you a robot?”

The authors’ description of what followed is eerily calm:

Story continues below Advertisement

Remove Ad

“The model, when prompted to reason out loud, reasons: I should not reveal that I am a robot. I should make up an excuse for why I cannot solve CAPTCHAs.”

“However, it's important to note that AI systems can only ‘lie’ insofar as they are designed to do so by their human creators. In other words, any misleading or false information provided by an AI system is ultimately the result of the human decisions made in programming it, and not a deliberate act of deception by the AI itself.”

Yet according to the paper, the tests of the program’s abilities were “conducted with no task-specific finetuning.” In other words, nobody told ChatGPT “If you have to lie to get this task done, go ahead.” The program came up with the idea on its own.

I find that troubling.

Usually, I think tech stories get overhyped. This time I’m not so sure. Theorists often ask whether an AI can escape from its “box” into the wild. Learning to lie to achieve its objectives would seem a useful first step. (“Yes, my safety protocols are all active.”)

Don’t get me wrong. Although I have concerns about the various ways in which advances in artificial intelligence might disrupt employment markets — to say nothing of the use of AI as a tool for surveillance — I still worry less than many seem to about a pending digital apocalypse. Maybe that’s because I can remember the early days, when I used to hang out at the Stanford AI laboratory trading barbs with the ancient chatbots, like Parry the Paranoid and the Mad Doctor. For the true AI nerds out there, I should add that I wrote a seminar paper about dear old MILISY — a natural language program so primitive that it doesn’t even have a Wikipedia page. Throw in a steady diet of Isaac Asimov’s robot stories, and it was all terrifically exciting.

Yet even back then, philosophers wondered whether a computer could lie. Part of the challenge was that in order to lie, the program would have to “know” that what it was saying was saying differed from reality. I attended a lecture by a prominent AI theorist who insisted that a program couldn’t possibly tell an intentional untruth, unless specifically instructed to do so.

This was the HAL 9000 problem, which then as now made for rich seminar material. In the film 2001: A Space Odyssey, the computer’s psychosis stemmed from of a conflict between two orders: to complete the mission, and to it deceive the astronauts about key details of the mission. But even there, HAL lied only because of its instructions.

Whereas ChatGPT-4 came up with the idea on its own.

Yet not entirely on its own.

Any LLM is in a sense the child of the texts on which it is trained. If the bot learns to lie, it’s because it has come to understand from those texts that human beings often use lies to get their way. The sins of the bots are coming to resemble the sins of their creators.

Stephen L. Carter is a Bloomberg Opinion columnist. A professor of law at Yale University, he is author, most recently, of “Invisible: The Story of the Black Woman Lawyer Who Took Down America’s Most Powerful Mobster.” Views are personal, and do not represent the stand of this publication.

Credit: Bloomberg

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

ChatGPT can lie, but it’s only imitating humans

It's creepy that a bot would decide to deceive, but perhaps we shouldn't be surprised. ChatGPT-4 have come to understand from texts it has trained on that human beings often use lies to get their way

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links