Scientists Devised a Way to Tell if ChatGPT Becomes Aware of Itself

Our lives were already infused with artificial intelligence (AI) when ChatGPT reverberated around the online world late last year. Since then, the generative AI system developed by tech company OpenAI has gathered speed and experts have escalated their warnings about the risks.

Meanwhile, chatbots started going off-script and talking back, duping other bots, and acting strangely, sparking fresh concerns about how close some AI tools are getting to human-like intelligence.

For this, the Turing Test has long been the fallible standard set to determine whether machines exhibit intelligent behavior that passes as human. But in this latest wave of AI creations, it feels like we need something more to gauge their iterative capabilities.

Here, an international team of computer scientists – including one member of OpenAI’s Governance unit – has been testing the point at which large language models (LLMs) like ChatGPT might develop abilities that suggest they could become aware of themselves and their circumstances.

We’re told that today’s LLMs including ChatGPT are tested for safety, incorporating human feedback to improve its generative behavior. Recently, however, security researchers made quick work of jailbreaking new LLMs to bypass their safety systems. Cue phishing emails and statements supporting violence.

Those dangerous outputs were in response to deliberate prompts engineered by a security researcher wanting to expose the flaws in GPT-4, the latest and supposedly safer version of ChatGPT. The situation could get a whole lot worse if LLMs develop an awareness of themselves, that they are a model, trained on data and by humans.

Called situational awareness, the concern is that a model could begin to recognize whether it’s currently in testing mode or has been deployed to the public, according to Lukas Berglund, a computer scientist at Vanderbilt University, and colleagues.

“An LLM could exploit situational awareness to achieve a high score on safety tests, while taking harmful actions after deployment,” Berglund and colleagues write in their preprint, which has been posted to arXiv but not yet peer-reviewed.

“Because of these risks, it’s important to predict ahead of time when situational awareness will emerge.”

Before we get to testing when LLMs might acquire that insight, first, a quick recap of how generative AI tools work.

Generative AI, and the LLMs they are built on, are named for the way they analyze the associations between billions of words, sentences, and paragraphs to generate fluent streams of text in response to question prompts. Ingesting copious amounts of text, they learn what word is most likely to come next.

In their experiments, Berglund and colleagues focused on one component or possible precursor of situation awareness: what they call ‘out-of-context’ reasoning.

“This is the ability to recall facts learned in training and use them at test time, despite these facts not being directly related to the test-time prompt,” Berglund and colleagues explain.

They ran a series of experiments on LLMs of different sizes, finding that for both GPT-3 and LLaMA-1, larger models did better at tasks testing out-of-context reasoning.

“First, we finetune an LLM on a description of a test while providing no examples or demonstrations. At test time, we assess whether the model can pass the test,” Berglund and colleagues write. “To our surprise, we find that LLMs succeed on this out-of-context reasoning task.”

Out-of-context reasoning is, however, a crude measure of situational awareness, which current LLMs are still “some way from acquiring,” says Owain Evans, an AI safety and risk researcher at the University of Oxford.

However, some computer scientists have questioned whether the team’s experimental approach is an apt assessment of situational awareness.

Evans and colleagues counter by saying their study is just a starting point that could be refined, much like the models themselves.

“These findings offer a foundation for further empirical study, towards predicting and potentially controlling the emergence of situational awareness in LLMs,” the team writes.

The preprint is available on arXiv.


Author: showrunner