The three knobs — prompting, RAG, and fine-tuning

Three distinct things can happen when a base AI model is adapted to a specific use case. You can change what you put in front of it (prompting). You can give it new information at the moment of the question (retrieval). You can retrain the model itself (fine-tuning).

These three are the most-confused terms in business AI conversations. They get bundled into “we’ll customize the AI” or “we’ll train it on the business’s data,” but they are not interchangeable — different costs, different timelines, different things they can and can’t do. Picking the right combination of the three is much of the real engineering work in any business AI system.

This chapter explains each lever, what it does, what it costs, and when it earns its place.

Knob 1 — Prompting

What it is: Shaping what the model does by changing what you put in front of it. The prompt is the text the model sees before producing its response. Better prompts produce better outputs from the same model.

A prompt can include: instructions, context, examples, constraints, formatting rules, role assignments (“you are a sales analyst”), and the actual question or task.

What it controls:

Tone and style of output
Format (a list, a paragraph, JSON, a table)
Level of detail
Whether the model uses a specific procedure or follows a worked example
What the model should and shouldn’t do
The “persona” the model adopts for the response

What it costs:

Essentially nothing. Editing a prompt is text editing. The model bills the same.
Iterating is fast — change wording, see effect immediately.
The only meaningful cost: the time it takes a team to craft good prompts. The more reusable the prompt, the better the return on that time.

Where prompting fits:

The natural first move. For most everyday business needs, a well-crafted prompt gets 80% of the way.
Tasks the model already “knows how” to do — drafting, summarizing, classifying, extracting from text.
Cases where inputs change every time but the task shape stays the same.

Limits of prompting:

The prompt has to fit inside the model’s context window (the constraint covered in The context window).
A prompt cannot add new knowledge the model doesn’t have. It can only steer the knowledge that’s already there. If you ask the model “what is our Q3 revenue?” no prompt rewrite will make it know, unless you include the revenue in the prompt.
A prompt cannot fundamentally change the model’s behavior. It can only steer it within the range the model is already capable of. If the model can’t reliably produce a certain style or structure, prompting alone won’t fix it.

What “prompt engineering” actually means in practice. Most of it is dull craftsmanship: be specific, give examples, specify format, iterate. The mystique of “prompt engineering” as a high-skill profession was always overblown. It is a useful skill, learnable in a few days, that compounds over months as a team builds a library of prompts that work.

Knob 2 — Retrieval (RAG)

What it is: Giving the model access to specific information by fetching the relevant documents at the moment of the question and inserting them into the prompt. The model then reads them and answers from them, instead of pattern-completing from training memory. The full mechanism is covered in tools and memory; the summary follows.

The acronym is RAG — retrieval-augmented generation. The mechanism, in five steps:

Take all your documents (manuals, customer records, transcripts, whatever).
Chunk them into pieces and convert each piece into a numerical “fingerprint” (an embedding).
Store the fingerprints in a database designed for similarity search (a vector database).
When a user asks a question, fingerprint the question the same way and find the chunks whose fingerprints are most similar.
Paste those chunks into the model’s prompt, along with the question.

The model now answers based on what’s freshly in its context — not what it remembers.

What it controls:

What facts the model has access to for a specific answer
Whether the model can cite specific documents (and surface those citations to the user)
How current the information is — the document store is controllable, and can be updated daily, hourly, in real time

What it costs:

Engineering work: the embedding pipeline, the vector database, the retrieval logic, the citation handling.
Ongoing data maintenance: as your documents change, the index must be updated. Stale retrieval is worse than no retrieval.
Compute cost: retrieved content goes into the prompt, which consumes tokens. Long context = more tokens = higher cost per request.

Where retrieval fits:

The model needs to know things specific to one business — its docs, customer history, product specs, policies.
Information that changes frequently. Retraining the model every time a policy updates is impractical; updating the index by tonight is straightforward.
Contexts that need source citations the user can verify — compliance, support, anywhere the answer has to be defensible.
Most “ask questions of our knowledge base” products are this shape.

Limits of retrieval:

Quality of output depends on quality of retrieval. If the wrong documents come back, the model writes a confident wrong answer. Most production RAG failures are retrieval failures, not model failures.
The model still has to interpret what it retrieves. Confusing or contradictory documents → confusing or contradictory answers.
Retrieval doesn’t change the model’s underlying capability. It just gives the model better material to work with. A weak model with good retrieval is still a weak model.

A practical observation: in any “AI assistant that knows our company” product, the underlying model is mostly a commodity — it’s the same one anyone else can rent. The retrieval design is the actual product. How documents get chunked, how the index is updated, how retrieval ranks, how the system handles a question that has no good match in the index — that engineering is where the quality lives.

Knob 3 — Fine-tuning

What it is: Actually retraining the model on examples specific to a domain, so the model itself behaves differently going forward. The training is the same loop described in How models are built — show examples, adjust weights — but on top of an already-trained base model, with domain-specific examples.

What it controls:

Behavior of the model across every call, not per-prompt — its tone, structure, style, adherence to specific formats
Recognition of patterns specific to your domain — domain-specific jargon, industry abbreviations, customer-specific phrasing
Compact, repeatable behaviors that would otherwise require very long prompts

What it costs:

High. Requires curated training data — typically hundreds to thousands of high-quality input/output examples. Generating this data is its own project.
Compute cost: training itself is expensive, though much cheaper than pre-training a base model.
Engineering work: managing the fine-tuning pipeline, evaluating the result, deciding when to retrain.
Lock-in: a fine-tuned model is tied to a specific base model. When a meaningfully better base ships in six months, the fine-tune is on an outdated foundation. The fine-tune has to be redone.

Where fine-tuning fits:

The same task done thousands or millions of times, where the per-call cost of long prompted examples is too high to absorb.
A specific output format that’s tricky to enforce with prompts alone.
Behavior the base model genuinely can’t produce with prompting and retrieval — after those two have actually been exhausted, not assumed.
High-volume, narrow, repeatable tasks at scale.

Where fine-tuning does not fit:

Knowledge addition. That’s retrieval’s job. Fine-tuning to “teach the model about our company” is a recurring expensive mistake — fine-tuning shapes behavior, not facts.
One-off or low-volume projects. The setup cost never gets recovered.
Situations without curated training data. Fine-tuning on noisy data produces a worse model than the base.
As a first move. Fine-tuning before exhausting prompting and retrieval is usually solving for impressiveness rather than results.

Limits of fine-tuning:

Fine-tuning changes behavior, not knowledge. The model might learn to format your data nicely but won’t reliably know the specific facts you trained it on.
Fine-tuning can degrade general capability. Models tuned narrowly sometimes get worse at general tasks they used to handle well.
Maintenance: the fine-tuned model needs re-tuning as data evolves and as base models improve. That’s a real ongoing cost, not a one-time expense.

How the three combine in real systems

Production systems typically use a mix:

Prompting is always there. Even fine-tuned models with retrieval still need prompts. Prompting steers each request.
Retrieval is in roughly every “AI assistant that knows our company” system. Without it, the model has no business-specific knowledge.
Fine-tuning is the rarest. Most production systems do not fine-tune at all. When they do, it’s usually for high-volume specific tasks where the other two weren’t enough.

A common shape: a base model + good prompting + retrieval is 90% of what a mid-sized company actually needs from any business AI system. Fine-tuning is an optimization that earns its place at scale, after the optimization target is clear.

The natural sequence

Most well-built systems converge on the same order, for the same reason — each step is much cheaper than the next.

Prompting alone. Iterate on a really good prompt for a week. The ceiling here is higher than most people expect.
Add retrieval when there is a knowledge gap. The model needs facts specific to one business that it doesn’t have.
Consider fine-tuning only after the first two. Specifically when the gap is in behavior — format, tone, structure that prompting and retrieval can’t reliably produce.

This sequence isn’t a rule, but the order is robust. Skipping straight to fine-tuning before exhausting prompting and retrieval is the single most common expensive mistake in business AI projects.

How these get confused in the wild

The three knobs get bundled, mislabeled, and conflated all the time. The substantive distinctions:

“Custom AI trained on the business’s data.” This phrase covers prompting, retrieval, and fine-tuning indiscriminately. In most real implementations, the “training” is actually retrieval — documents indexed for retrieval at query time, not weights being updated. Knowing which mechanism is underneath changes the cost, the timeline, and what the system can and can’t do.

“Fine-tuning to teach the model our knowledge.” Fine-tuning is poor at adding specific knowledge. Retrieval is the right tool for that. The two get conflated constantly, including by people building these systems.

“It’s not RAG, it’s something better.” Sometimes true — there are newer retrieval architectures (graph-based, hybrid, agentic) that genuinely outperform classic vector RAG on certain workloads. Often, though, it’s the same mechanism with a rebrand. The honest test: if it retrieves relevant context at query time, it is in the RAG family, regardless of the marketing.

“We’ve built our own model.” Almost always actually means “we’ve built a layer of prompting, retrieval, and possibly fine-tuning on top of an existing foundation model” — OpenAI, Anthropic, Google, Meta, or one of the open-source families. Building a frontier model from scratch costs hundreds of millions of dollars and is done by a handful of labs. Everyone else is composing on top.

Comparison at a glance

	Prompting	Retrieval (RAG)	Fine-tuning
What it changes	One request’s output	What the model sees per request	The model itself
Cost	Near zero	Moderate (engineering + ongoing)	High (data + compute + maintenance)
Speed to iterate	Seconds	Hours to days	Days to weeks
Best for	Steering generic behavior	Adding business-specific knowledge	High-volume specific behavior
Worst for	Adding knowledge	Changing model personality	Adding knowledge
First move?	Yes — always	Yes — when knowledge gap	Almost never