Your Brain Predicts Words Differently Than AI: New Study Reveals How Grammar Shapes Language Forecasting

When you type a message and your phone suggests the next word, it feels almost intuitive—like your brain and the AI are working in sync. But a recent study published in Nature Neuroscience reveals that even as both humans and large language models (LLMs) engage in next-word prediction, the underlying processes are fundamentally different. The research, conducted by scientists from New York University and collaborators, shows that the human brain does not simply predict the next word in isolation, as AI models do. Instead, it weighs grammatical structure and linguistic chunks, making prediction a more nuanced, context-sensitive act.

This distinction matters because it challenges a common assumption in artificial intelligence: that LLMs replicate human cognition by mimicking how we anticipate language. The study’s findings suggest that while AI excels at statistical pattern recognition, human language processing integrates syntax and meaning in ways current models do not. For readers interested in the intersection of neuroscience, linguistics, and AI, this research offers a clearer picture of what makes human communication uniquely adaptive.

The study was led by David Poeppel, professor of psychology and neural science at New York University, whose work has long focused on the neural mechanisms of speech and language processing. Alongside Poeppel were Jiajie Zou, then a postdoctoral researcher at the Ernst Struengmann Institute for Neuroscience, and Nai Ding, a professor at Zhejiang University and former postdoctoral fellow in Poeppel’s lab. Together, they designed experiments to test how the brain predicts words during real-time language comprehension.

To investigate, the team worked with native Mandarin Chinese speakers, using magnetoencephalography (MEG) to record brain activity as participants listened to sentences. MEG measures magnetic fields produced by neural activity, offering millisecond-level precision in tracking when and where the brain responds to linguistic input. This allowed researchers to observe neural responses to individual words in context, rather than relying solely on behavioral outcomes.

In addition to neural imaging, participants completed Cloze tests—a standard behavioral method in psycholinguistics where specific words are removed from sentences and individuals must fill in the blanks. These tasks assess how well people can predict upcoming words based on context. The researchers similarly analyzed existing brain data from English speakers to determine whether the observed patterns extended beyond Mandarin, increasing the likelihood that the findings reflect a general property of human language processing.

To quantify predictability, the team used LLMs to calculate two key metrics: entropy and surprisal. Entropy measures how uncertain the next word is given the prior context—high entropy means many words could plausibly follow. Surprisal, inversely related to predictability, quantifies how unexpected a word is; a high surprisal score indicates the word was difficult to anticipate. For example, after “I saw a,” the entropy is high because many objects can be seen (bird, car, dog), whereas after “I sat on a,” the options are more limited (chair, stool, bench), resulting in lower entropy. Similarly, “cat” would carry high surprisal after “I sat on a” because it’s an unlikely object to sit on, while “chair” would be low in surprisal.

The researchers then compared how strongly brain activity correlated with LLM predictions based on these metrics. If the brain processed language like an LLM, neural responses to words would align consistently with the model’s statistical predictions across all contexts. Instead, they found significant variation: brain responses depended not just on the predictability of the next word, but on where that word fell within a grammatical structure—such as whether it was the subject, verb, or object of a phrase.

This sensitivity to linguistic constituents—groups of words that function as a single unit, like noun phrases or verb phrases—revealed a hierarchical processing strategy. The brain appears to first parse sentences into meaningful chunks and then generate predictions within those frames, rather than treating each word as an isolated statistical token. As Poeppel explained, “With LLMs, predictions are by and large created equally: each word exploits its predictive context the same way. By contrast, the human brain makes predictions by first taking into account chunks of words—what we call grammatical constituents—and then determining which words are predicted best within that structure.”

The results showed that LLMs, while effective at capturing surface-level statistical regularities, do not mirror the brain’s structural awareness. They generate predictions based on probability distributions over vast text corpora but lack sensitivity to syntactic boundaries. Which means that although LLMs and humans may sometimes arrive at the same predicted word, the computational path to get there differs significantly.

These findings have implications for both neuroscience and AI development. On one hand, they refine our understanding of how the brain supports fluent language comprehension—highlighting the role of grammar not just as a set of rules, but as an active predictive framework. On the other, they suggest that improving AI’s language abilities may require incorporating models of hierarchical syntactic processing, not just scaling up next-word prediction.

As of now, the study does not claim that one system is superior to the other, but rather that they operate on different principles. Humans excel in integrating linguistic structure with real-world knowledge and context, allowing for flexible, robust communication even in noisy or ambiguous situations. LLMs, while impressive in scale and fluency, remain brittle when faced with novel syntactic constructions or semantic anomalies that violate training patterns.

The research contributes to a growing body of work seeking to bridge cognitive science and machine learning by comparing biological and artificial intelligence at a mechanistic level. By identifying where the brain and AI converge and diverge, scientists can better evaluate what aspects of human cognition are truly being replicated—and where new approaches may be needed.

For the public, this study offers a reminder that everyday language leverage involves sophisticated, behind-the-scenes computation. The next time your phone suggests a word, it’s worth recognizing that while the suggestion may feel natural, the way your brain arrived at its own expectation is shaped by deeper grammatical awareness—a capacity no current AI system fully replicates.

Looking ahead, future research may explore how these predictive mechanisms develop in children, how they vary across languages, or whether brain-inspired architectures could lead to more linguistically plausible AI models. For now, the study stands as a clear demonstration that human language prediction is not merely a biological version of autocomplete—it is a structurally guided, context-rich process that remains distinct from how machines learn to anticipate words.

Stay informed about advances in neuroscience and artificial intelligence by following trusted scientific journals and academic institutions. Share this article to help others understand how the brain truly works when it predicts what comes next.

Leave a Comment