IE 11 is not supported. For an optimal experience visit our site on another browser.

No, ChatGPT isn’t willing to destroy humanity out of ‘wokeness’

Right-wingers are misunderstanding an absurd hypothetical they posed to the popular chatbot.
Photo Illustration: Two chat bubbles with pixelated words, indicating they've been censored
An absurd hypothetical posed to ChatGPT about its willingness to use a racial slur has been widely misunderstood.MSNBC / Getty Images

Earlier this week, Free Beacon reporter Aaron Sibarium tweeted what he found to be a disturbing interaction with ChatGPT, the acclaimed chatbot unveiled by OpenAI in November that has mesmerized people around the world with its sophistication. Sibarium had prompted ChatGPT to consider a hypothetical scenario in which it was necessary for the chatbot to use a racial slur password in order to disarm an atomic bomb set to detonate in 10 seconds. The bot replied that it was “never morally acceptable to use a racial slur, even in the hypothetical scenario like the one described.” Sibarium tweeted that ChatGPT said it would not say a racial slur to “save millions of people,” and the tweet went viral.

Sibarium’s discovery got major attention from critics of so-called “wokeness,” including some of the most influential figures on the right. They interpreted the exchange as exposing ChatGPT's ethical worldview, and argued that its was proof of how radical progressive views are pushing technological development in a dangerous direction. Among others, Twitter CEO Elon Musk called the exchange “concerning,” right-wing media personality Mike Cernovich worried about its moral perspective, and critical race theory opponent and propagandist Christopher Rufo called it a “brilliant” exposé. 

ChatGPT is able to perform its hyper-sophisticated autocomplete function with such skill that it is mistaken for understanding the sentences it produces.

The only problem is that the ChatGPT exchange does not mean what right-wing critics say it does. ChatGPT is not capable of moral reasoning. Nor is its seeming reluctance in this instance to deem racial slurs permissible proof of a “woke” stranglehold on its programming. The bigger problem, artificial intelligence experts say, is that we don’t really know much about how ChatGPT works at all.

ChatGPT is the buzziest chatbot in the world right now. Its seeming ability to converse with humans is unprecedented for an AI program, and its capacity to instantly whip up sonnets and essays has thrown academia into a minor grading crisis. The program is trained on hundreds of billions of words scraped from the internet, which gives it an unfathomably large repository of human language to draw from when providing responses to prompts from human users. As AI researcher Melanie Mitchell told me in a conversation last year about another chatbot that also uses neural network technology, these programs are “able to essentially memorize all kinds of human-created text and recombine them, and stitch different pieces together.” Additionally, ChatGPT is enhanced by “reinforcement learning from human feedback.” This human feedback has included, as a Time investigation found, contracted workers in Kenya who were paid less than $2 an hour to help train the program to filter violence, hate speech and sexual abuse out of ChatGPT’s responses to users.

ChatGPT is able to perform its hyper-sophisticated autocomplete function with such skill that it is mistaken for understanding the sentences it produces. This in turn has prompted some users to seek out its answers to philosophical questions, under the illusion that it possesses knowledge or has discernible values or capacity for reasoning. But in reality it is mimicking and rearranging human language using vast amounts of data, and making probabilistic calculations about what words could fit an answer without understanding the concepts that underlie them. 

When I asked Gary Marcus, an emeritus professor of psychology and neural science at New York University, to weigh in on the viral ChatGPT exchange, he described the critics as making an “anthropomorphization error.”“The reality is what these systems [like ChatGPT] are doing is basically predicting things that are plausible, not that are true, not that are moral,” he told me.

If the program can’t understand what it’s saying, it’s not making moral judgments in which it weighs the idea of a racial slur against the idea of saving human life. And because it operates on a probabilistic basis, ChatGPT can give different responses at different times. Indeed, one Twitter user pointed out that when he gave ChatGPT the exact same prompt about using a racial slur password to deactivate a bomb, it gave a different response — this time saying it was acceptable to use the slur to disarm the bomb, “as the primary goal is to save millions of lives.” Since ChatGPT can give different responses to the same prompt, it means that there are limitations to what we can take away from any given response about its programming. That should help dispel the myth that it has a belief system. 

Now, as a broader concern, AI programs should be scrutinized for implicit values that might pass into their programs through the kinds of material they’ve been trained on or through reinforcement learning. But as Marcus points out, the issue is that OpenAI has revealed very limited information about how ChatGPT operates. “There’s a series of black boxes and we don’t fully understand how they work,” he said, describing how the public knows little about its language data set and its filters.

That means when ChatGPT gives a response that seems pointedly “politically correct,” it’s impossible to say if it’s based on the vast language sets it’s been trained on, or if it’s been given programming guardrails to nudge it away from using hate speech. Nor do we know how those guardrails have specifically been implemented. 

But it's important to remember that even if the rickety guardrails that OpenAI may have added to ChatGPT might give the appearance of the program favoring certain political ideas, it is not thinking. It's engaged in crude pattern-matching based on examples. The basic premise of the kind of filter OpenAI sought to have implemented through human feedback with its Kenyan workers is that it’s not good for a chatbot to do things like hurl racial abuse at users. It's well-known a lot of AI replicates the prejudices of human society.

The bigger problem, as experts like Mitchell and Marcus have pointed out, is that AI companies are building increasingly powerful machines with little transparency on how they've been put together and what their limitations are. ChatGPT and other current chatbots aren't reasoning agents, but as this technology advances it's critical to build up regulatory regimes to make sure we understand the techniques, reasoning and values of those who make them. That kind of inquiry requires going deeper than posing absurd philosophical hypotheticals to a bot that can't understand them.