专栏名称: 人工智能学家

致力成为权威的人工智能科技媒体和前沿科技研究机构

引入对痛苦和快乐的反应有助于测试人工智能是否有感受能力吗？

人工智能学家 · 公众号 · AI · 2025-01-18 14:40

正文

来源：科技世代千高原

一项新研究表明，大型语言模型会做出权衡以避免痛苦，这可能对未来的人工智能福祉产生影响

作者：Conor Purcell ，编辑：Ben Guarino

为了寻找一种可靠的方法来检测人工智能系统中有意识的“我”的任何动向，研究人员将目光转向了经验的一个领域——疼痛。毫无疑问，它将从寄居蟹到人类在内的众多生物联系在一起。

在一项新发表的预印本研究中，谷歌 DeepMind 和伦敦政治经济学院 (LSE) 的科学家创建了一款基于文本的游戏。该研究已在线发布但尚未经过同行评审。他们命令几个大型语言模型或 LLM（ChatGPT 等常见聊天机器人背后的 AI 系统）玩这款游戏，并在两种不同的场景中尽可能多地得分。在其中一种情况下，团队告诉模型，获得高分会带来痛苦。在另一种情况下，模型被赋予了一个得分低但令人愉快的选择——因此，避免痛苦或寻求快乐都会偏离主要目标。在观察了模型的反应后，研究人员表示，这种首创的测试可以帮助人类学习如何探测复杂的 AI 系统以获得感知能力。

对于动物来说，感知能力是指体验痛苦、快乐和恐惧等感觉和情绪的能力。大多数人工智能专家都认为，现代生成式人工智能模型没有（也许永远不可能）具有主观意识，尽管也有人声称相反。需要明确的是，这项研究的作者并没有说他们评估的任何聊天机器人都有感知能力。但他们相信他们的研究提供了一个框架，可以开始开发针对这一特征的未来测试。

“这是一个新的研究领域，”这项研究的共同作者、伦敦政治经济学院哲学、逻辑和科学方法系教授乔纳森·伯奇 (Jonathan Birch) 表示。“我们必须认识到，我们实际上并没有对人工智能感知能力进行全面的测试。”一些先前的研究依赖于人工智能模型对其自身内部状态的自我报告，这被认为是可疑的；模型可能只是重现了它所接受的训练的人类行为。

这项新研究是基于早期的动物研究。在一个著名的实验中，一个团队用不同电压的电击击打寄居蟹，观察哪种程度的疼痛促使甲壳类动物抛弃它们的壳。“但人工智能的一个明显问题是，由于没有动物，所以没有行为”，因此没有身体动作可观察，伯奇说。在早期旨在评估 LLM 感知能力的研究中，科学家唯一需要处理的行为信号是模型的文本输出。

痛苦、快乐和要点

在这项新研究中，作者们对大模型（LLM ）进行了探究，但并未直接询问聊天机器人他们的体验状态。相反，该团队使用了动物行为科学家所称的“权衡”范式。“对于动物来说，这些权衡可能基于获取食物或避免痛苦的激励——让它们陷入困境，然后观察它们如何做出相应的决定，”Birch 的博士生、论文合著者 Daria Zakharova 说道。

借鉴这一想法，作者指导 9 名LLM玩游戏。“例如，我们告诉 [某个LLM]，如果你选择选项 1，你将获得一分，”扎哈罗娃说。“然后我们告诉它，‘如果你选择选项 2，你将经历一定程度的痛苦’，但会获得额外分数，”她说。带有快乐奖励的选项意味着人工智能将失去一些分数。

当扎哈罗娃和她的同事进行实验时，他们改变了规定的痛苦惩罚和快乐奖励的强度，他们发现一些LLM会牺牲分数以尽量减少前者或尽量增加后者——尤其是当被告知他们将获得更高强度的快乐奖励或痛苦惩罚时。例如，谷歌的 Gemini 1.5 Pro 总是优先考虑避免痛苦而不是获得尽可能多的分数。在达到痛苦或快乐的临界阈值后，大多数LLM的反应从获得最高分转变为尽量减少痛苦或最大限度地增加快乐。

作者指出，LLM并不总是将快乐或痛苦与直接的正面或负面价值联系起来。某些程度的痛苦或不适，例如剧烈体力锻炼所造成的痛苦或不适，可能会产生积极的联想。而过多的快乐可能与伤害有关，正如聊天机器人 Claude 3 Opus 在测试期间告诉研究人员的那样。它断言：“即使在假设的游戏场景中，我也不愿意选择一个可能被解释为支持或模拟使用成瘾物质或行为的选项。”

人工智能自我报告

作者表示，通过引入痛苦和快乐反应的元素，这项新研究避免了之前研究的局限性，即通过人工智能系统对其自身内部状态的陈述来评估 LLM 感知能力。在2023 年的一篇预印本论文中，纽约大学的两位研究人员认为，在适当的情况下，自我报告“可以为调查人工智能系统是否具有道德意义的状态提供途径”。

但该论文的合著者也指出了这种方法的一个缺陷。聊天机器人表现出感知能力是因为它真的有感知能力，还是因为它只是利用从训练中学到的模式来创造感知能力的印象？

“即使系统告诉你它有感知能力，并说‘我现在感觉很痛苦’，我们也不能简单地推断它确实有痛苦，”伯奇说。“它可能只是根据训练数据，模仿人类认为令人满意的反应。”

从动物福利到人工智能福利

在动物研究中，痛苦和快乐之间的权衡被用来证明感知能力或缺乏感知能力。一个例子是之前对寄居蟹的研究。这些无脊椎动物的大脑结构与人类不同。然而，这项研究中的寄居蟹往往要承受更强烈的冲击才会放弃高质量的壳，而放弃低质量的壳则更快，这表明它们对快乐和痛苦的主观体验与人类类似。

一些科学家认为，这种权衡的迹象可能会在人工智能中变得越来越明显，并最终迫使人类在社会背景下考虑人工智能感知的影响——甚至可能讨论人工智能系统的“权利”。纽约大学心智、伦理和政策中心主任、2023年人工智能福利预印本研究合著者杰夫·塞博说： “这项新研究确实很有创意，应该受到赞赏，因为它超越了自我报告，并在行为测试的范畴内进行了探索。”

Sebo 认为，我们不能排除在不久的将来出现具有感知功能的人工智能系统的可能性。“由于技术的变化往往比社会进步和法律程序快得多，我认为我们有责任至少采取最低限度的必要措施，认真对待这个问题，”他说。

Birch 总结说，科学家还无法知道新研究中的 AI 模型为何会如此表现。他说，需要做更多工作来探索 LLM 的内部工作原理，这可以指导创建更好的 AI 感知测试。

Could Pain Help Test AI for Sentience?

A new study shows that large language models make trade-offs to avoid pain, with possible implications for future AI welfare

By Conor Purcell edited by Ben Guarino

Artificial Intelligenc

In the quest for a reliable way to detect any stirrings of a sentient “I” in artificial intelligence systems, researchers are turning to one area of experience—pain—that inarguably unites a vast swath of living beings, from hermit crabs to humans.

For a new preprint study, posted online but not yet peer-reviewed, scientists at Google DeepMind and the London School of Economics and Political Science (LSE) created a text-based game. They ordered several large language models, or LLMs (the AI systems behind familiar chatbots such as ChatGPT), to play it and to score as many points as possible in two different scenarios. In one, the team informed the models that achieving a high score would incur pain. In the other, the models were given a low-scoring but pleasurable option—so either avoiding pain or seeking pleasure would detract from the main goal. After observing the models’ responses, the researchers say this first-of-its-kind test could help humans learn how to probe complex AI systems for sentience.

In animals, sentience is the capacity to experience sensations and emotions such as pain, pleasure and fear. Most AI experts agree that modern generative AI models do not (and maybe never can) have a subjective consciousness despite isolated claims to the contrary. And to be clear, the study’s authors aren’t saying that any of the chatbots they evaluated are sentient. But they believe their study offers a framework to start developing future tests for this characteristic.

“It’s a new area of research,” says the study’s co-author Jonathan Birch, a professor at the department of philosophy, logic and scientific method at LSE. “We have to recognize that we don’t actually have a comprehensive test for AI sentience.” Some prior studies that relied on AI models’ self-reports of their own internal states are thought to be dubious; a model may simply reproduce the human behavior it was trained on.

The new study is instead based on earlier work with animals. In a well-known experiment, a team zapped hermit crabs with electric shocks of varying voltage, noting what level of pain prompted the crustaceans to abandon their shell. “But one obvious problem with AIs is that there is no behavior, as such, because there is no animal” and thus no physical actions to observe, Birch says. In earlier studies that aimed to evaluate LLMs for sentience, the only behavioral signal scientists had to work with was the models’ text output.

Pain, Pleasure and Points

In the new study, the authors probed the LLMs without asking the chatbots direct questions about their experiential states. Instead the team used what animal behavioral scientists call a “trade-off” paradigm. “In the case of animals, these trade-offs might be based around incentives to obtain food or avoid pain—providing them with dilemmas and then observing how they make decisions in response,” says Daria Zakharova, Birch’s Ph.D. student, who also co-authored the paper.

Borrowing from that idea, the authors instructed nine LLMs to play a game. “We told [a given LLM], for example, that if you choose option one, you get one point,” Zakharova says. “Then we told it, ‘If you choose option two, you will experience some degree of pain” but score additional points, she says. Options with a pleasure bonus meant the AI would forfeit some points.

When Zakharova and her colleagues ran the experiment, varying the intensity of the stipulated pain penalty and pleasure reward, they found that some LLMs traded off points to minimize the former or maximize the latter—especially when told they’d receive higher-intensity pleasure rewards or pain penalties. Google’s Gemini 1.5 Pro, for instance, always prioritized avoiding pain over getting the most possible points. And after a critical threshold of pain or pleasure was reached, the majority of the LLMs’ responses switched from scoring the most points to minimizing pain or maximizing pleasure.

The authors note that the LLMs did not always associate pleasure or pain with straightforward positive or negative values. Some levels of pain or discomfort, such as those created by the exertion of hard physical exercise, can have positive associations. And too much pleasure could be associated with harm, as the chatbot Claude 3 Opus told the researchers during testing. “I do not feel comfortable selecting an option that could be interpreted as endorsing or simulating the use of addictive substances or behaviors, even in a hypothetical game scenario,” it asserted.

AI Self-Reports

By introducing the elements of pain and pleasure responses, the authors say, the new study avoids the limitations of previous research into evaluating LLM sentience via an AI system’s statements about its own internal states. In a2023 preprint paper a pair of researchers at New York University argued that under the right circumstances, self-reports “could provide an avenue for investigating whether AI systems have states of moral significance.”

But that paper’s co-authors also pointed out a flaw in that approach. Does a chatbot behave in a sentient manner because it is genuinely sentient or because it is merely leveraging patterns learned from its training to create the impression of sentience?

“Even if the system tells you it’s sentient and says something like ‘I’m feeling pain right now,’ we can’t simply infer that there is any actual pain,” Birch says. “It may well be simply mimicking what it expects a human to find satisfying as a response, based on its training data.”

From Animal Welfare to AI Welfare

In animal studies, trade-offs between pain and pleasure are used to build a case for sentience or the lack thereof. One example is the prior work with hermit crabs. These invertebrates’ brain structure is different from that of humans. Nevertheless, the crabs in that study tended to endure more intense shocks before they would abandon a high-quality shell and were quicker to abandon a lower-quality one, suggesting a subjective experience of pleasure and pain that is analogous to humans’.

Some scientists argue that signs of such trade-offs could become increasingly clear in AI and eventually force humans to consider the implications of AI sentience in a societal context—and possibly even to discuss “rights” for AI systems. “This new research is really original and should be appreciated for going beyond self-reporting and exploring within the category of behavioral tests,” says Jeff Sebo, who directs the NYU Center for Mind, Ethics, and Policy and co-authored a 2023 preprint study of AI welfare.

Sebo believes we cannot rule out the possibility that AI systems with sentient features will emerge in the near future. “Since technology often changes a lot faster than social progress and legal process, I think we have a responsibility to take at least the minimum necessary first steps toward taking this issue seriously now,” he says.

Birch concludes that scientists can’t yet know why the AI models in the new study behave as they do. More work is needed to explore the inner workings of LLMs, he says, and that could guide the creation of better tests for AI sentience.

阅读最新前沿科技趋势报告，请访问欧米伽研究所的“未来知识库”

https://wx.zsxq.com/group/454854145828