What AI is bad at

The complement of the previous chapter. Equally important.

The honest list

Anything that needs ground truth. Specific numbers. Specific names. Specific dates. Recent events. Anything that wasn’t in the training data, or was in it inconsistently, or has changed since. The model will guess plausibly — and you may not catch it.

Internal company knowledge. Your CRM, your wiki, last quarter’s numbers, your customer history. None of it was in training. A bare model knows nothing about the specific business it’s being applied to unless that context is put into the prompt.

Action. A model can write the email; sending it requires plumbing. A model can write the SQL; running it requires plumbing. A model can describe how to book a flight; booking it requires connecting to a flight system. By itself, the model produces text. Everything that turns text into action is a separate engineering problem.

Memory across conversations. A new chat starts blank. Anything you taught the model yesterday is gone. Products can simulate memory by re-injecting facts at the start of each chat — but the model itself does not remember.

Math at scale. Models can do simple arithmetic and surprisingly often nail multi-step word problems. But for anything where you’d reach for a spreadsheet or a calculator, a model alone is unreliable. The fix is to let the model use tools (a calculator, a code runner). Without that, it’s predicting plausible numerical answers.

Anything safety-critical. Medical diagnosis without a physician. Legal advice without a lawyer. Financial decisions without a compliance review. Not because the model can never get these right, but because the cost of being subtly wrong is too high to accept on output the model produces.

Strict consistency at high volume. If you need the same input to always produce the same output, you need a rule, not a model. Models have randomness baked in. Two runs of the same prompt can differ.

Long-horizon planning that depends on real-world feedback. A model can write a five-step plan. It can’t watch how step one actually unfolds and adapt before step two. Multi-step plans that depend on the real world require either tight human-in-the-loop or sophisticated agent architecture — and even then, fragility is the norm.

The “almost true” trap

The trickiest failure mode is when a model is almost right. Or sounds like it.

The prompt

"Tell me about Stride Logistics, an Indian delivery startup."

The model's answer

Stride Logistics, founded in Pune in 2018 by IIT-Delhi alums Aryan Kapoor and Meera Iyer, has become India's leading last-mile delivery platform for tier-2 cities. The company raised ₹420 crore in a Series C round led by Sequoia in late 2023, valuing it at ₹3,100 crore. Stride now operates in 73 cities across India and processes over 2.4 million packages per month. Its competitors include Delhivery and Ecom Express, though Stride has differentiated through its proprietary route-optimisation model built in-house.

A summary that’s 95% accurate, except for one sentence that subtly misstates the source. A code suggestion that runs but introduces a subtle bug. A confident answer about your competitor’s pricing that’s plausible and wrong. These cost more than obvious failures, because they slip past review.

The defense: when stakes are high, structure the task so the model produces something verifiable against a source. “Summarize this document and cite the sentence each claim came from.” “Generate code and run the tests.” Pure free-text output is the riskiest output.

How this maps onto real systems

A decision made on the model’s output with no human in between is the riskiest shape. The fit is rare and the failure cost is high — that’s the shape that needs the most structure around it.

Exact-numbers work doesn’t go to the model directly. It goes to a calculator, a database, a spreadsheet. The model can orchestrate which one runs, but it shouldn’t be the thing computing.

Action — sending, booking, paying, updating — is its own layer. Tools, approvals, audit trails. The model produces the text. Something else turns text into the action, with traceability around it.

And anywhere the work depends on specifics of one business — its customers, its numbers, its history — the model needs grounding. That’s the work of tools and memory.