Building trust — humans, audits, gates

A reasonable model will be wrong some of the time. Hallucinations, misreads, edge cases — they will happen at any volume.

The instinct is to drive the error rate to zero. The more useful move is to design workflows where being wrong does not matter much. Wrong answers become drafts, suggestions, items in a review queue — not actions taken in the world.

This chapter is about the patterns mature operations use to do that, and how each one maps to the kinds of decisions an AI is involved in.

The three patterns

                                       AI       Human       Auto
                                       ─────────────────────────
   Human-in-the-loop (HITL)            Propose   Approve     —
   Human-on-the-loop (HOTL)            Act       Sample      —
   Human-out-of-the-loop (HOOTL)       Act       —           Catch

These patterns describe the workflow, not the AI. The same shape applies whether the model is sitting inside a consumer chatbot an analyst is using, an internal tool a team built on an API, a self-hosted model running on company GPUs, or a vendor product — the four surfaces AI typically lives on inside an operation. The pattern is a property of how the output flows into action — not a property of where the model came from.

Human-in-the-loop

The model proposes, a person approves before anything happens. Every output goes through a human gate.

Examples: A legal AI drafts a contract clause; an associate reviews before it reaches the client. A sales AI writes outbound email; a rep approves before send. A customer-service AI composes a reply; an agent okays before posting. An analyst pastes a draft into ChatGPT, gets a sharpened version back, edits it again before it ships.

This is the strongest pattern. Errors do not propagate. The human is the firewall.

The cost is throughput. A person is in every loop. The AI has not bought full automation; it has bought a faster draft. That is often a substantial win, but it is a different win from “the AI does the work.”

Fits: high-stakes, externally visible, customer-affecting outputs.

Human-on-the-loop

The model acts. A person samples — reviews a percentage of outputs, watches dashboards, gets alerted on anomalies.

Examples: A model categorises support tickets and routes them; a supervisor spot-checks a hundred per day. A pricing model adjusts product prices within a band; a manager reviews changes weekly. An agent runs a research task overnight and a human reviews its log in the morning.

The model has actual autonomy. The human is governance, not gate. They catch patterns rather than individual mistakes.

The cost is selection. Someone has to pick the right samples. Random sampling catches systematic errors. Anomaly-based sampling catches outliers. Most production teams combine both.

Fits: mid-stakes work where the cost of one error is bounded and the cost of slow review is high.

Human-out-of-the-loop

The model acts. Other automation catches. Validators, second models, checks, circuit breakers.

Examples: A code AI commits code; CI tests catch bugs before deployment. A trading model executes; risk limits trip before exposure breaches thresholds. A chatbot answers customer questions; a second model flags answers that violate policy.

This is the only pattern that gives full speed and scale. It also requires the most engineering. The “catch” system has to be at least as reliable as the human it replaces. If the catch is weaker than the human review it displaced, the workflow has quietly become more brittle, not more efficient.

Fits: high-volume, low-individual-stakes operations where automated checks can be trusted, or where speed makes human review impossible.

Picking the pattern

The right axis is not “how smart is the AI” but blast radius.

Blast radius is the cost — in money, reputation, customer harm, time — of one wrong action that nobody caught.

A wrong summary in an internal note: small blast radius. Someone notices in a meeting; nothing breaks. Human-out-of-the-loop is fine.
A wrong number in a board pack: medium blast radius. Embarrassing if missed, recoverable. Human-on-the-loop, with checks.
A wrong action taken on a customer account — refund, lock-out, posted message: large blast radius. Hard to take back. Human-in-the-loop.
A wrong financial transaction: extreme blast radius. Often irreversible. Human-in-the-loop, ideally with multiple humans.

The other axis is reversibility. Reversible mistakes are cheap to fix. Irreversible mistakes are not. Same model, same accuracy — irreversibility moves the right pattern up the chain.

The question worth holding for any AI workflow is the same in plain form: what is the worst case if one output is wrong and nobody catches it? If the answer is small and reversible, lighter controls are reasonable. If it is large or irreversible, the workflow needs the tighter pattern, even if that costs throughput.

Audit trails

The single highest-leverage control across all three patterns is logging.

For every AI-assisted action, the useful things to log are:

Input — the prompt, the data the model was given.
Output — what the model produced.
Decision — what the human (or automation) did with it: approved, rejected, edited, escalated.
Edits — if the human modified the output before acting, what they changed.
Timestamps and identities — who, when.

Two reasons.

First: when a question arrives — from a customer, from regulators, from internal review — a team that can reconstruct what happened is in a different position from a team that cannot. “We don’t know” is the worst answer in those conversations.

Second: the log is a dataset. Across thousands of outputs and decisions, the patterns that emerge are what the model gets right, what it gets wrong, what humans almost always change. That is the data behind any sensible decision to tune a workflow, retrain, or move a use case to a different pattern. Without the log, those decisions are guesses.

Audit trails are cheap. They are mostly a metadata layer on top of whatever is already being stored. The engineering cost is small. The optionality is large.

There is a corner case worth naming: consumer-chatbot use inside the workforce typically produces no audit trail at all. An analyst pasting drafts into ChatGPT leaves no log inside the company. That gap is one of the unspoken costs of consumer-tool adoption, and is one of the reasons many operations eventually move that activity onto enterprise plans or internal tools — not because the consumer plan is unsafe, but because there is no record of what was done with it.

Internal governance — who can do what

The three patterns describe how an individual workflow handles errors. Sitting one level above them is the question of who in the organisation is allowed to use which AI surface for which kinds of work.

Operations that handle this well tend to make a small number of decisions explicit, rather than leaving them implicit:

Which AI tools are sanctioned for which categories of data. A common pattern: enterprise plan of a specific tool for anything customer-touching; consumer tools fine for general research and personal productivity; nothing with regulated data without going through the sanctioned route.
Who is allowed to deploy AI into a production workflow without further review. Often a named function — engineering, ops, a small AI council — rather than “whoever wants to.”
Which outputs need a second human reviewer before they leave the company. This is usually a function of audience (customer, regulator, board) rather than of the AI involved.
What gets logged where, and who has access to the logs.

This is not a governance framework in the policy-document sense. It is a small set of explicit defaults that close the gap between “policy says be careful” and “what is the team supposed to do at 10am on Tuesday.” Most failures in this area are not from people breaking rules — they are from rules nobody ever spelled out.

Trust is built in production

Most governance frameworks try to certify trust upfront. Risk-assess the model, audit the vendor, write a policy, then deploy. The certification ends. The use begins.

The more honest posture is: trust is earned in production. Narrow start, careful watching, expansion when the data supports it.

A practical rollout looks like:

Pilot in one team, one workflow, one shape of decision. Human-in-the-loop. Heavy logging.
Run for a month. Review the logs. What did the model get right? Where did humans intervene? What patterns of error emerged?
Adjust the workflow. Tighten controls where the model was wrong in a way that mattered. Loosen them where humans were rubber-stamping.
Maybe relax the pattern. If the model is reliable at one task, can it move from human-in-the-loop to human-on-the-loop? If yes, document why; watch what happens.
Expand to the next workflow. Repeat.

Each step is one decision, made with evidence. Compounded over a year, a portfolio of AI workflows emerges, each at the right pattern for its blast radius.

Operations that rush this — deploying broadly before they have data — tend to end up with mistakes they cannot reconstruct, or pulling everything back when one thing goes wrong. Operations that move stepwise tend to end up with AI working in twenty places, each tuned to the right pattern.

When AI does not belong in the workflow

Not every workflow should have AI in it. Some shapes:

If a single wrong action causes irreversible serious harm — to a customer, to the business, to a third party — and no control pattern fully prevents it, AI may not belong in that decision. At least not yet.
If the cost of running the right control is higher than the value the AI adds, the workflow is a wash. AI for theatre is its own kind of cost.
If the failure mode is statistically rare but catastrophic, and cannot be tested for in the lab, the conservative call usually ages well. The thing that has not gone wrong has often not gone wrong because it has not been run long enough.

Declining a use case is a respectable answer. It is also rare — most workflows, with the right pattern, can be made to work.

What this means for your business

The operations that get trusted with more AI over time are the ones that build the muscle around these three patterns and the logging that supports them.

That muscle is mostly process, not technology. It is the habit of asking what the blast radius is before a workflow goes live. The discipline of logging enough to reconstruct what happened. The willingness to start narrow and expand based on evidence rather than enthusiasm.

None of this is glamorous. It is the boring work that lets the not-boring work — actually using AI to change how an operation runs — proceed without becoming the kind of story that ends up in a board meeting for the wrong reason.