Why models sometimes make things up

A pharma analyst asks ChatGPT for a recent FDA approval. The model gives a name, a date, a drug class. Confident. Detailed. Wrong.

A finance team asks for the CEO of a mid-cap public company. The model returns a plausible name and a plausible bio. Smooth. Specific. Wrong.

These are the same machine producing the same kind of output it always produces. The mechanism that writes accurate sentences is the mechanism that writes inaccurate ones. The chapter explains why that’s true, and where it gets sharpest.

The mechanic

A model produces the next word given everything before it. There is no separate fact-checker in the loop. The question being computed is “what word probably comes next?”, not “is the sentence I’m producing true?”

If the training data contained thousands of paragraphs in the shape “the CEO of [company] is [name]…”, and the prompt asks for a name the model has never seen, the model still completes the pattern. It writes the most plausible-sounding name — fluently, in the same voice as every accurate paragraph it has ever produced.

The output of a true answer and a fabricated one are produced by the same process. They are visually indistinguishable from the outside because, on the inside, they are the same kind of thing.

Where the failure mode concentrates

Hallucination isn’t uniform. It clusters in predictable territory.

Specific numbers and dates. The model is reliable on the shape of a sentence about a number. The actual digit is far less reliable — especially for figures that weren’t widely repeated across the training data.

Names and citations. “According to a recent paper by Singh and Patel…” reads as authoritative. The paper may not exist. Citations are a particularly sharp instance because the form of a citation is highly regular and easy to fabricate, while the contents (who wrote what, where, when) are individual facts the model often doesn’t have.

Anything recent. Every model has a knowledge cutoff — the date the training data ends. Past that date, the model is, in effect, guessing from the patterns of what was in the data. What models actually “know” goes deeper.

Anything internal to a single company. A company’s own data was not in the public training set. The model has nothing specific to retrieve — so it pattern-matches from how similar companies have been described in public, and the specifics come out wrong.

Long, confident, surprisingly detailed answers. In human writing, length and specificity are signals of effort and expertise. In a model’s output, they aren’t — the model is uniformly confident because there is no mechanism for it to sound otherwise. A two-paragraph answer with seven specific details is as easy for the model to produce as a one-line answer; it is not seven times more likely to be true.

Why grounding changes the picture

Hand the model the actual FDA approval letter, then ask the same question, and the mechanism shifts. The model is no longer reaching for patterns from training; it is summarising text that’s right there in its context window. The next-word prediction is now conditioned on the document, and the most likely next words become the words that match the document.

This single shift — putting the answer into the model’s view before asking the question — is what every serious factual-AI product is built on. Pasting a document into ChatGPT, uploading a PDF, connecting a knowledge base, the whole machinery of enterprise “AI search”: all variants of the same move. The industry words for it are retrieval and grounding, and Tools and memory walks through how it’s wired up in real systems.

Grounding doesn’t make hallucination impossible. A model can still misread, mis-summarise, or smooth over a contradiction in the source. But the failure mode shifts from “invented a fact from nothing” to “misrepresented something that exists” — a much smaller and more checkable problem.

The shape of the risk

Hallucination tracks two variables: how much native ground truth the model has on the question, and how much human checking happens before the output is used.

Where the model has been trained on a topic from a thousand angles — common concepts, well-documented technical material, generic business writing — the patterns it produces are mostly correct, and ordinary review catches the rest. Drafting an internal email about scheduling a meeting is on this end.

Where the model has no native ground truth — your company’s numbers, a recent event, a specific person who isn’t a public figure, an obscure regulation — the patterns it produces are confidently fabricated unless the relevant text is brought into context. Quoting a customer back to themselves, citing a clause from a contract, stating last quarter’s revenue: this end.

The same model handles both. Whether the output is trustworthy is a function of what is being asked and what is in front of the model when it’s asked — not of which model it is or how recent the version.