How models are built — training

A new model lands. The press release describes it as trained on “10 trillion tokens of internet text, books, and code,” then “fine-tuned with human feedback,” then “aligned for safety.” Three phrases that get repeated across every launch, and that each refer to a distinct stage of a very specific pipeline.

This chapter is about where the model in your chat window came from — not the underlying math, but the shape of the process. Once you have that shape, every later word in the AI vocabulary (fine-tuning, base model, open-weights, RLHF, knowledge cutoff) slots into a place you can already see.

Start with nothing

A model, before it’s trained, is just a structure — a network of nodes and connections, with the connection strengths (called weights) set to random numbers. It can’t do anything. Ask it to predict the next word and it gives you nonsense.

The whole story of training is taking those random weights and slowly nudging them — billions of times — until the model gets useful.

The basic loop of training

Here is what training actually does, stripped to its essence.

Take a piece of text from your training data. Say: “The cat sat on the mat.”
Cut off the last word. Show the model: “The cat sat on the ___”
Ask it to predict. Initially, it predicts nonsense — say, “purple.”
Compare to the correct answer (“mat”). The model was wrong.
Adjust every weight in the network a tiny bit, in the direction that would have made the model slightly more likely to say “mat.”
Repeat. With a different piece of text. And another. And another.

Do this billions of times across trillions of words. After all those nudges, the weights have settled into a configuration that captures the patterns of how language works. The same patterns it absorbed are now baked into its parameters.

That configuration — the trained weights — is what you’re talking to when you open ChatGPT or Claude. The model in your chat window is the result of one long, very expensive nudging process.

Three stages, not one

When people say “training,” they usually mean a process that happens in three stages. Each does something different.

Pre-training

The big one. The model sees trillions of words pulled from the internet, books, papers, code repositories, and forums. For each, it plays the next-word-prediction game described above.

After pre-training, you have a base model. It is a very good next-word predictor. It can complete almost any sentence, in almost any style, on almost any topic — but it has no idea how to be helpful. Show it “How do I make pasta?” and it might give you a recipe, or it might just continue with “is a common question…” because both completions are plausible continuations.

Pre-training is where the model learns language and general world knowledge. It is also the most expensive step — tens of millions of dollars for a frontier model.

Instruction tuning

The base model is now shaped to follow instructions. The training data shifts from raw text to question-answer pairs and prompts with desired responses. Things like:

Prompt: Summarize the following email. Response: [a good summary]

After thousands of these examples, the model learns the pattern: when given a prompt that looks like a task, produce a response that looks like a fulfilled task. This is what makes ChatGPT feel like an assistant rather than an autocomplete engine.

Alignment (RLHF and friends)

The final shaping step. The model produces multiple possible responses to the same prompt. Human reviewers say which is better. The model is then trained to prefer responses humans liked.

This step is responsible for the model’s tone — polite, hedged, careful — and for the things it refuses to do (helping with weapons, slurs, etc.). It is also why two different labs’ models have noticeably different personalities even when they have similar underlying capability. They are calibrated by different humans against different rubrics.

The acronym you’ll hear is RLHF — reinforcement learning from human feedback. There are newer variants (DPO, RLAIF). The underlying idea is the same: humans guide the model toward the behavior they want.

Training data — what the model has seen

The model is what it eats. Its sense of language, knowledge of the world, biases, and writing style all trace back to what was in the training data.

For most frontier models, the training data includes:

Most of the public internet (scraped, deduplicated, filtered)
Millions of books
Academic papers
Code repositories (especially open-source)
Forum posts (Reddit, Stack Overflow, and others)
Wikipedia, in many languages
Curated reference texts

What is not in most training data:

Your company’s internal documents
Private databases
Anything behind a paywall (mostly)
Anything that happened after the knowledge cutoff — the date the training data ends

We come back to what the model actually knows in the next chapter.

Why this costs what it costs

Training a frontier model today costs tens of millions of dollars and takes weeks on thousands of specialized chips (GPUs or TPUs). The cost is dominated by:

The data — collecting, cleaning, deduplicating, filtering trillions of words is a project in itself
The compute — running the model on all that data, adjusting weights billions of times, takes enormous power
The humans — instruction tuning and alignment need carefully designed examples and reviewers

This is why only a small number of labs train frontier models. It is also why “training” and “using” are very different costs. Using a trained model — what’s called inference — is cheap by comparison. Each request you send to ChatGPT might cost the lab fractions of a cent in compute.

The asymmetry matters for any business adopting AI: someone else pays to build the model; the business pays to use it. The interesting question for most companies is never “should we train a model?” — it is “how do we adapt an existing one to our work?” The three knobs for that are the subject of The three knobs — prompting, RAG, and fine-tuning.

The model you talk to is a snapshot

A specific model — the one behind whatever ChatGPT tab is open right now, or whichever version a vendor has wired into a product — is the result of one specific training run. Its knowledge cutoff is the date its training data ends. Its tone is the product of its instruction tuning and alignment. Its quirks — which questions it answers well, where it refuses, how it formats lists, how cautious it is around medical or legal phrasing — are baked in by the choices of the lab that built it.

When a lab releases a new version, they have repeated some or all of those stages: sometimes with the same architecture and more data, sometimes with a meaningfully different approach. The result is a different model, even when the marketing name carries continuity. The same prompt can behave differently across versions, and the differences aren’t random — they trace back to specific choices in instruction tuning and alignment data.

Two practical consequences fall out of this.

The first is that production AI systems care about which exact model version they are calling. Most APIs expose dated version pins — a string like gpt-4o-2024-11-20 is a specific snapshot of one training run, not the moving “GPT-4o” label — precisely so that behaviour stays stable underneath whatever the team has built. The alternative is silent capability drift across upgrades.

The second is that a model’s idiosyncrasies are inherited, not arbitrary. The model that refuses certain political questions, hedges certain medical claims, or formats every answer with bullet-pointed headings is doing that because the humans who shaped its alignment phase optimised for those behaviours. Understanding which model is in play is part of understanding what it will do.