AI chatbots can analyze sentences like a trained linguist, new UC Berkeley research shows, providing a glimpse into how AI models are improving while also challenging the idea that humans are unique in our ability to think about language.
加州大学伯克利分校(UC Berkeley)的一项新研究表明,人工智能聊天机器人可以像训练有素的语言学家一样分析句子,这让人们得以一见人工智能模型是如何改进的,同时也挑战了人类在语言思考能力方面独一无二的观点。
AI platforms like ChatGPT are widely understood to be sophisticated prediction machines. Trained on vast troves of content ranging from news articles and books to film scripts and Reddit posts, they anticipate the next most likely letters and words when prompted. While their responses can give the impression they’re sentient thinkers, that sci-fi scenario hasn’t yet panned out.
像ChatGPT这样的人工智能平台被广泛认为是复杂的预测机器。从新闻文章、书籍到电影剧本和Reddit帖子,它们都接受过大量内容的训练,在提示时,它们会预测下一个最可能出现的字母和单词。虽然他们的回答给人的印象是他们是有感情的思考者,但这种科幻情节尚未得到证实。
But new UC Berkeley research reveals for the first time that AI chatbots can now analyze sentences like a trained linguist. The study, which will be published in the journal IEEE Transactions on Artificial Intelligence, provides a glimpse into how AI models are improving and also challenges the idea that humans are unique in our ability to think about language.
但加州大学伯克利分校的一项新研究首次表明,人工智能聊天机器人现在可以像训练有素的语言学家一样分析句子。这项研究将发表在《IEEE人工智能学报》(IEEE Transactions on Artificial Intelligence)上,它让人们得以一窥人工智能模型是如何改进的,同时也挑战了人类在语言思考能力方面独一无二的观点。
With roots in linguistics and philosophy, our ability to think deeply about words and sentence structure is a defining human cognitive feat, said Gašper Beguš, a Berkeley associate professor of linguistics and lead author of the research. But that ability to talk about and manipulate language — a process called metalinguistics — is becoming the domain of AI chatbots, too.
这项研究的主要作者、伯克利大学语言学副教授Gašper beguiz说,我们对单词和句子结构进行深入思考的能力源于语言学和哲学,这是一项决定性的人类认知能力。但这种谈论和操纵语言的能力——一种被称为元语言学的过程——也正在成为人工智能聊天机器人的领域。
“Our new findings suggest that the most advanced large language models are beginning to bridge that gap,” Beguš said. “Not only can they use language, they can reflect on how language is organized.”
贝古斯说:“我们的新发现表明,最先进的大型语言模型正开始弥补这一差距。”“他们不仅会使用语言,还能思考语言是如何组织起来的。”
Beguš and his team fed 120 complex sentences into multiple versions of OpenAI’s ChatGPT, as well as Meta’s Llama 3.1. With each sentence, they instructed the system to analyze it, assess if it had a specific linguistic quality, and diagram it with what linguists call syntactic trees — visual representations of a sentence’s structure and components.
Beguš v和他的团队将120个复杂句子输入OpenAI的ChatGPT的多个版本,以及Meta的Llama 3.1。对于每个句子,他们指示系统对其进行分析,评估其是否具有特定的语言质量,并用语言学家所说的语法树(句子结构和成分的视觉表示)将其绘制成图表。
In the sentence “Eliza wanted her cast out,” for example, researchers wanted to know if AI could detect what’s called ambiguous structure. Did Eliza want someone to be expelled? Or did she want a physical cast to be removed?
例如,在“Eliza want her cast out”这句话中,研究人员想知道人工智能是否能检测到所谓的模糊结构。伊莉莎希望有人被开除吗?还是她想要移除物理石膏?
First theorized by Noam Chomsky, recursion is the ability for humans to embed phrases within other phrases, as in the sentence “The dog that chased the cat that climbed the tree barked loudly.” This can lead to an endless nesting effect of sentences. Chomsky called it a defining feature of human language and one that separates us from other animals.
诺姆·乔姆斯基(Noam Chomsky)首先提出了递归的理论,递归是人类在其他短语中嵌入短语的能力,就像句子“追赶爬树的猫的狗叫得很响”。这可能会导致句子的嵌套效果。乔姆斯基称它是人类语言的一个决定性特征,也是我们区别于其他动物的一个特征。
To test the concept of recursion, Beguš and his team prompted the AI models to identify if a sample sentence had it and what specific linguistic version it had. They also instructed the models to add another similar recursive clause.
为了测试递归的概念,贝古斯和他的团队让人工智能模型识别一个样本句子是否有递归的概念,以及它有什么特定的语言版本。他们还指示模型添加另一个类似的递归子句。
Using the sentence “Unidentified flying objects may have conflicting characteristics,” OpenAI’s o1 detected the recursion — “flying” modifies “objects,” and “unidentified” modifies “flying objects.” It diagrammed the sentence. And it took the sentence to a new level: “Unidentified recently sighted flying objects may have conflicting characteristics.”
使用“不明飞行物可能具有相互冲突的特征”这句话,OpenAI的01检测到递归——“飞行”修改“物体”,“不明”修改“飞行物体”。它用图表表示了这个句子。它将这句话提升到了一个新的高度:“最近发现的不明飞行物可能具有相互矛盾的特征。”
Researchers wrote that “o1 significantly outperformed all others.”
研究人员写道:“我的表现明显优于其他所有人。”
“This is very consequential,” Beguš said, adding that it advances the debate about whether AI “understands” language or merely mimics it. “It means in these models, we have one of the rare things that we thought was human-only.”
贝古斯奇说:“这是非常重要的。”他补充说,这推动了关于人工智能是“理解”语言还是仅仅模仿语言的辩论。“这意味着在这些模型中,我们拥有了一种罕见的东西,我们认为这是人类独有的。”
He added that the approach they used to study AI’s understanding of language is one that linguists can use to assess other advances in AI chatbots. That, in turn, can help sort hype around the technology from the facts about how tools are actually improving.
他补充说,他们用来研究人工智能对语言的理解的方法是语言学家可以用来评估人工智能聊天机器人的其他进步的方法。反过来,这可以帮助区分围绕技术的炒作和工具实际如何改进的事实。
“Everyone knows what it’s like to talk about language,” he said. “This paper creates a nice benchmark or criterion for how the model is doing. It is important to evaluate it scientifically.”
“每个人都知道谈论语言是什么感觉,”他说。“这篇论文为模型的表现创造了一个很好的基准或标准。重要的是对其进行科学评估。”



