June 16, 2025
LLMs don’t perceive, they predict | Understand why calling AI output errors “confabulation” | Model behavior
Hallucination vs. Confabulation: Tracing the History of These Terms and Their Misuse in Computer Science
The terms hallucination and confabulation carry distinct meanings in the fields of psychiatry and neuroscience. Hallucination refers to false sensory experiences that occur without external stimuli, such as hearing voices or seeing things that are not there. Confabulation, on the other hand, involves the creation of false or distorted memories, often to fill gaps caused by memory impairments. Both terms have rich histories in clinical contexts and provide critical insights into how the human brain processes perception and memory.
In recent years, these terms have crossed over into the world of computer science, particularly in discussions about the behavior of large language models (LLMs) like GPT. When these AI systems produce incorrect or fabricated outputs, such as nonexistent citations or implausible "facts," the phenomenon has been widely labeled as a hallucination. However, this usage is misleading, as it suggests a perceptual process that LLMs, which lack sensory input, do not possess.
A closer examination reveals that confabulation is a far more accurate analogy for these errors. Just as humans with memory deficits construct plausible but false narratives, LLMs generate outputs based on incomplete or probabilistic associations within their training data.
The term hallucination has its roots in the Latin word hallucinari, meaning "to wander in the mind" or "to dream." Historically, the concept of hallucination has intrigued philosophers, theologians, and scientists alike, often linked to questions of perception, reality, and the mind’s ability to misinterpret or fabricate sensory experiences.
Discussions of hallucination date back to ancient times. Philosophers such as Plato and Aristotle contemplated the nature of perception and the mind’s capacity for error. In medieval theology, hallucinations were often considered supernatural phenomena, attributed to divine visions or demonic influences.
It wasn’t until the Enlightenment that hallucinations began to be examined through a scientific lens. The works of René Descartes, for example, raised critical questions about the reliability of sensory perception. Descartes’ famous assertion, "I think, therefore I am," reflected his attempt to distinguish between true knowledge and deceptive sensory input.
The modern understanding of hallucination as a clinical phenomenon emerged in the 19th century with the rise of psychiatry as a discipline. Early pioneers such as Jean-Étienne Dominique Esquirol, a French psychiatrist, distinguished hallucinations from illusions. While illusions involve the misinterpretation of real external stimuli (e.g., mistaking a shadow for a person), Esquirol described hallucinations as perceptions occurring in the absence of external stimuli, generated entirely by the mind.
Hallucinations soon became recognized as a defining symptom of certain psychiatric conditions, particularly schizophrenia and bipolar disorder with psychotic features. This understanding paved the way for deeper investigations into their biological and psychological underpinnings.
By the 20th century, advances in neuroscience began to unravel the mechanisms behind hallucinations. Research showed that hallucinations arise from dysfunctions in specific brain regions, such as:
Hallucinations became a critical focus in clinical diagnosis and treatment. For instance:
A key characteristic of hallucinations is their sensory quality. Individuals experiencing a hallucination truly perceive it as real, whether it is a voice, a figure, or a sensation. This sensory realism is why hallucinations are often so distressing and disruptive.
Unlike memory distortions or logical errors, hallucinations are firmly rooted in perception—they arise from brain processes mimicking external stimuli, even when no such stimuli exist.
The term hallucination, as understood in its clinical and historical context, is intrinsically linked to perception. This makes its application to non-sensory systems like large language models problematic. LLMs do not "perceive" or process sensory input; their errors are not perceptual but arise from flaws in reasoning and information retrieval.
The misuse of the term in computer science obscures the true nature of AI errors and invites unnecessary confusion. To clarify this distinction, it’s important to contrast hallucination with another clinical phenomenon—confabulation—which is far more analogous to how LLMs generate false information
The term confabulation originates from the Latin word confabulare, meaning "to talk together" or "to chat." In its clinical context, it refers to the unintentional fabrication of memories or the misremembering of events. Unlike hallucinations, which are rooted in perception, confabulations arise from memory and cognitive distortions. The history of confabulation as a concept is deeply intertwined with the study of neurological and psychiatric disorders, particularly those involving memory impairments.
Confabulation began to emerge as a distinct phenomenon in medical literature during the late 19th century, as clinicians sought to understand the complexities of memory. Early observations of patients with amnesia or brain injuries revealed a peculiar tendency to "fill in the blanks" of their memory gaps with plausible but false narratives.
As psychiatry and neurology advanced in the 20th century, confabulation became a key symptom for diagnosing memory-related disorders. Researchers began to recognize its distinct characteristics:
This understanding solidified confabulation as a phenomenon linked to the brain’s memory and reasoning processes, rather than its perceptual systems.
With advancements in neuroscience, researchers identified the brain regions associated with confabulation:
Confabulation is now commonly observed in conditions such as:
In addition to its neurological basis, confabulation has psychological dimensions that reflect the mind’s need for coherence:
As artificial intelligence (AI) technologies have evolved, particularly large language models (LLMs) like GPT, BERT, and similar systems, their ability to generate human-like text has impressed researchers and users alike. However, these models are not immune to errors. One prominent category of error is the generation of information that is entirely false or fabricated but presented as if it were true. To describe this phenomenon, the AI research community adopted the term “hallucination.”
The term “hallucination” first gained traction in AI research during the development of image recognition and generation models. These systems occasionally produced images or patterns that did not correspond to any real-world object, leading researchers to liken these outputs to hallucinations. This analogy was later extended to text-based AI models when they began producing fabricated information.
The adoption of “hallucination” to describe LLM errors is an attempt to convey that these outputs are detached from reality. However, the term is fundamentally misaligned with how LLMs operate.
Despite the term’s limitations, “hallucination” has been used broadly in AI research and popular discourse to describe several types of LLM errors:
The term "hallucination" persists in AI discussions for several reasons, despite its clinical roots being largely unrelated to how LLMs function:
While the term may be convenient and catchy, it introduces several challenges:
As AI technologies become more integrated into daily life, precision in terminology is critical for fostering trust and understanding. The use of “hallucination” to describe LLM errors is a metaphor that confuses more than it clarifies. A more accurate term, such as confabulation, would better align with how these systems generate false outputs and promote clearer communication across disciplines.
In the following sections, we’ll explore why confabulation is not only a better fit but also a more scientifically accurate description of how large language models produce erroneous or fabricated information. By understanding this distinction, we can develop a deeper appreciation for the limitations and potential of AI systems
The term hallucination has been widely adopted to describe the errors of large language models (LLMs), such as generating false or fabricated information. However, this term is a poor fit when examined through the lens of clinical and scientific accuracy. The errors produced by LLMs are not perceptual distortions, as the term “hallucination” implies, but rather logical fabrications arising from gaps in knowledge or ambiguity in input—making confabulation a far more suitable analogy.
Confabulation, in clinical psychology and neurology, refers to the creation of false or distorted memories, often in response to memory deficits. These fabricated memories:
This concept closely mirrors how LLMs process and generate text when faced with incomplete data or ambiguous prompts.
Large language models operate by analyzing vast amounts of training data and generating responses based on probabilistic patterns. When presented with queries that exceed the scope of their training or involve ambiguous or conflicting data, LLMs construct plausible but incorrect outputs. These responses share several key characteristics with confabulation:
While the term “hallucination” is catchy, its clinical and perceptual roots make it ill-suited for describing LLM errors. Key distinctions include:
Adopting the term confabulation for LLM errors is not only more accurate but also enhances clarity in discussions about AI behavior. Here’s why:
The process by which LLMs generate text—drawing from stored data to produce coherent but occasionally flawed responses—closely resembles how human memory systems compensate for deficits. Referring to these errors as confabulations bridges the gap between cognitive science and computer science.
Using “confabulation” underscores the fact that LLM errors are a result of their design, not a failure of perception. This distinction can help avoid anthropomorphizing AI systems or overstating their capabilities.
Switching to a term rooted in cognitive science can foster better collaboration between AI researchers, psychologists, and neuroscientists. This interdisciplinary alignment may lead to new insights into both human cognition and AI design.
Describing AI errors as confabulations rather than hallucinations reduces sensationalism and fosters realistic expectations. It communicates to the public that these errors are systematic and predictable rather than mysterious or out of control.
The widespread use of the term “hallucination” to describe the errors of large language models (LLMs) may seem harmless at first glance, but it introduces significant challenges for both researchers and the public. Misusing this term leads to confusion about how AI operates, creates ethical and technical issues, and hinders progress in addressing AI's limitations.
The term “hallucination” implies that LLMs experience something akin to perceptual errors, a notion that can mislead both researchers and the general public.
Mislabeling LLM errors as “hallucinations” has ethical and technical consequences that go beyond simple terminology.
Using the term “hallucination” to describe LLM errors is more than a linguistic shortcut—it fundamentally distorts how these errors are understood by researchers, developers, and the public. By treating LLM errors as perceptual phenomena, we risk oversimplifying the problem, delaying effective solutions, and fostering misconceptions about AI systems. Recognizing these errors as confabulations instead can lead to clearer communication, better error-handling strategies, and more informed public discourse around AI technology.
The term "hallucination," while popular in discussions about AI-generated errors, is a misrepresentation of the phenomenon. By shifting to “confabulation”, we can foster a clearer, more scientifically accurate understanding of how large language models (LLMs) work. This reframing benefits interdisciplinary communication, aligns with established concepts in cognitive science, and enhances the development of solutions tailored to the actual mechanisms underlying AI errors.
Adopting the term "confabulation" offers significant advantages in both scientific and practical contexts:
To establish consistency in how AI-generated errors are described, the following guidelines are proposed:
Reframing AI errors as confabulations can clarify their nature and origins. Below are examples of how this shift in terminology can be applied:
Reframing AI errors as confabulations instead of “hallucinations” provides clarity, accuracy, and a shared vocabulary for interdisciplinary research and development. By aligning terminology with established concepts in cognitive science, this shift fosters better understanding of AI behavior and paves the way for more effective strategies to address its limitations. Using "confabulation" also avoids anthropomorphizing AI systems, helping to set realistic expectations and build trust in these transformative technologies.
VIII. Broader Implications for AI and Cognitive Science
The reframing of AI errors from “hallucination” to “confabulation” has far-reaching implications beyond mere terminology. By aligning AI behavior with concepts rooted in psychology and neuroscience, we open doors for interdisciplinary collaboration, advancements in AI design, and ethical communication that builds trust in these transformative technologies.
Bridging the gap between psychology, neuroscience, and AI research is critical to advancing both fields.
Understanding AI errors as confabulations can drive innovation in how LLMs are designed and optimized.
Accurate descriptions of AI behavior are vital for building trust among users, developers, and the general public.
Reframing LLM errors as confabulations has implications that extend beyond AI design to how we approach interdisciplinary research and public engagement. By bridging cognitive science and AI, we can foster collaboration and innovation, improving both fields. Ethical communication that accurately represents AI capabilities builds trust, sets realistic expectations, and paves the way for responsible and effective use of these powerful technologies. This holistic approach ensures that the conversation around AI remains grounded in clarity and shared understanding, benefiting researchers, developers, and users alike.
The terms hallucination and confabulation have rich histories rooted in clinical psychology and neuroscience, each describing distinct phenomena. Hallucination refers to false sensory perceptions occurring without external stimuli, while confabulation involves the unintentional creation of plausible but false memories, often as a response to gaps in knowledge or memory deficits. These concepts, carefully defined and studied in human cognition, have profound implications for how we interpret errors in artificial intelligence systems.
In the context of large language models (LLMs), the term "hallucination" has been widely used to describe the generation of false or fabricated outputs. However, as we have explored, this term is a poor fit. AI systems lack sensory perception, and their errors stem from probabilistic reasoning based on incomplete or ambiguous data. These characteristics align far more closely with confabulation than with hallucination, making confabulation the more accurate and scientifically appropriate term.
This distinction is more than semantic—it shapes how we understand and communicate about AI behavior. By adopting “confabulation” to describe these errors, we can foster a clearer understanding of AI systems’ limitations and capabilities. Accurate language improves interdisciplinary collaboration, enabling computer scientists, cognitive scientists, and ethicists to work together more effectively. It also ensures that the public receives transparent, trustworthy information about AI technologies.
We urge computer scientists, AI researchers, and industry leaders to reconsider the terminology used to describe LLM errors. By embracing more precise language, we can demystify AI systems, set realistic expectations, and accelerate progress in building reliable, ethical, and effective AI tools. The shift from “hallucination” to “confabulation” is not just a linguistic adjustment—it’s a step toward deeper understanding and meaningful collaboration across disciplines.
We're now accepting new patients