When a model needs help — tools and memory

A bare model is limited by what’s in its training and what fits in the context window. Most production AI systems aren’t bare models. They give the model two crucial additions: tools and memory.

Tools

A tool is anything the model can use to do something it couldn’t do on its own.

The simplest example: a calculator. A model is mediocre at math. Give it a calculator tool, and now when you ask for 23% of 184,592, the model writes “I’ll use the calculator: 23% of 184,592 = 42,456” and is reliably correct.

Other common tools:

Web search. Model can now answer questions about recent events. “What’s the news on the budget?” becomes a search + summarize task instead of a guess.
Database query. Model can answer questions about your customers, your inventory, your tickets. “How many open tickets do we have for the new feature?” becomes a SQL query.
File access. Model can read your documents at scale.
API calls. Model can hit your CRM, your billing system, your scheduling tool.
Code execution. Model writes Python, runs it in a sandbox, uses the result.

The pattern is always the same: the model produces a structured request (“call this tool with this input”), the system runs it, the result comes back to the model, and the model continues its response with the result in hand.

When you hear “ChatGPT can search the web” or “Claude can run code” — they’re describing tool use.

External memory (retrieval, also called RAG)

The model’s training has nothing about a specific company. The context window can hold a small book, but not a full company knowledge base.

The standard fix: when a user asks something, the system finds the relevant documents from a knowledge base, pastes them into the context, and then asks the model to answer using them. The model goes from “guessing about the business” to “summarizing the actual answer from the docs.”

This pattern is called retrieval-augmented generation, or RAG in the trade. The mechanism is unglamorous:

Take all your docs. Chunk them up. Convert each chunk into a numerical “fingerprint” stored in a special database.
When a question comes in, fingerprint the question the same way.
Find the chunks whose fingerprints are closest to the question’s.
Paste those chunks into the model’s context with the question.
The model answers based on what’s now in front of it.

Almost every “ask questions of our internal docs” product is some flavor of this.

Watch RAG run · one stage at a time

0 / 5

Click Step to watch a RAG request from question to answer.

Why the wrapper matters more than the model

AI products that handle company-specific information — internal Q&A bots, support copilots, sales assistants — sit on top of a model anyone else can also rent. The base model is mostly a commodity. The differentiated work is in the wrapper:

How data sources are connected.
How sources are chunked and indexed.
How retrieval ranks the right chunks for the right question.
How updates flow through when documents change.
Whether the system cites sources back so the answer is checkable.

The model is the engine. The retrieval is the steering wheel. Both have to work, and most of the engineering effort in any serious “AI for our company” product is in the second one.

How this maps onto the AI products around you

Anything that “knows things about the business” — its customers, its history, its operations — is almost certainly running retrieval underneath. The model itself wasn’t trained on that data; it can’t be.

Anything that “does things” — books, sends, queries, updates — is running tools underneath. Connectors to specific systems, with specific permissions, and (in good designs) an audit trail of what was called and why.

A model with no tools and no retrieval can only work on whatever the user types in. That covers a small slice of business work — drafting, transformations, the things a bare model handles well. For everything else, the tools-and-memory layer is the product.