The four worries — what can go wrong

When AI fails inside a business, the failure is rarely the kind that makes the news.

It is a customer record that ended up in someone else’s training data. A piece of marketing copy that turned out to be paraphrased from a competitor’s site. A hiring screen that quietly nudges decisions in one direction. A finance bot that confidently reports a number wrong by a factor of ten.

Four shapes. Almost every AI risk inside an operation fits one of them — whether the AI in question is an analyst typing into a consumer chatbot, an internal tool a team built on an API, a self-hosted model on the company’s own GPUs, or a third-party product the business adopted.

The shapes don’t depend on where the AI came from. The controls do.

1. Data exposure

What it is: information your business should not be giving away ends up where it should not be.

The most common version inside most companies is an employee pasting customer data, contract terms, or unreleased plans into a consumer chatbot. On free or low-tier plans, that prompt is typically logged, may be used for training, and definitely sits on a server outside the business’s control. The path is invisible — there is no procurement step, no security review, no log on the company side.

The less obvious versions live further inside the stack. An internal AI assistant a team built quietly stores every conversation in an upstream API provider’s logs. A document-summariser sends the full PDF — including the financials nobody meant to share — to an endpoint with a thirty-day retention window. A self-hosted model runs inside the company VPC but writes prompts and outputs to a log file that nobody is rotating, encrypting, or monitoring. A voice agent transcribes a customer call to a third-party speech service that is itself a separate vendor.

The shape of the risk is the same in all four: data ended up somewhere unintended, with retention and access terms nobody on the business’s side set.

The controls vary by surface. For consumer chatbots inside the workforce, it is policy and routing — making the right tools the obvious tools, and the wrong tools harder to reach. For internal builds, it is endpoint choice and architecture — which APIs, which retention settings, which logs the team keeps where. For self-hosted models, it is the things any sensitive internal system needs: access control, log hygiene, monitoring. For vendor products, it is contract terms and sub-processor lists.

We unpack the data-flow side in Where your data goes.

2. IP and ownership

What it is: questions about who owns what, who is allowed to use what, and what you are quietly inheriting from a model’s training.

Three sub-questions hide here.

Can AI output be owned? In most jurisdictions, work generated purely by an AI cannot be copyrighted by anyone. A meaningful human contribution matters. When AI is used to draft contracts, marketing assets, code, or images, the question of who actually owns the result is unsettled in ways that surprise people.

What was the model trained on? Large models were trained on enormous quantities of text and images whose creators did not opt in. Several lawsuits are pending; answers will land in time. In the meantime, an image model can occasionally reproduce a copyrighted character almost verbatim — and the moment that output is published under the company’s name, it is the company’s problem.

What is the indemnity? Some vendors offer indemnity against IP claims arising from their model outputs — but only on certain plans, with certain features enabled, for certain kinds of use. Outputs from a self-hosted model, or from a consumer tool an analyst happened to use, generally come with no indemnity at all.

We go deeper in IP, copyright, and provenance.

3. Bias and fairness

What it is: the model reflects patterns from its training data, including patterns you would not endorse.

A resume-screening AI rates applicants. The training data overweighted historical hires from a few schools. The AI now rates those schools higher. The company has automated a pattern it spent years trying to move past.

A lending model. A pricing model. A customer-routing model. Anywhere a model is making or shaping a decision about a person, this risk is live — regardless of whether the model is a frontier closed model, a fine-tuned open model, or a vendor product. A self-hosted model is not automatically fairer because it sits on company hardware; it inherits the biases of whatever data it was trained on, including any internal data it was fine-tuned with.

Two things make this hard. First, the skew is usually invisible from the outside. The model gives confident answers; only an audit of outputs across groups reveals the pattern. Second, “fair” is contested. Different definitions of fairness mathematically contradict each other, and which definition applies depends on the domain.

The controls here are not technical. They are review processes. Periodic audits of outputs across the groups you care about. A human in the loop on consequential decisions. Documentation of what the model was trained for and where it should not be used. None of these change based on whether the model is yours or someone else’s.

4. Confident wrong answers, acted on

What it is: the model hallucinates or misreads, and the wrong output flows into an action no one stopped.

This is the most common failure in production. A model summarises a meeting and gets a name wrong; the summary goes out to the client. A chatbot tells a customer a return policy that does not exist; the company honours it. A code AI writes a database migration with a subtle off-by-one; it runs at 3am. A self-hosted financial model produces a forecast number with extra confidence because it never says “I don’t know”; a report goes to the board with that number.

This is not a model-quality problem. Even a model that is wrong 1% of the time will be wrong tens of times a day at any real volume. The risk is not the error rate — it is the path from the error to action.

The control is workflow. Where is the human review? What is the blast radius if no one catches it? Which decisions need a second pair of eyes; which ones can run on autopilot? The workflow question is the same whether the model is OpenAI’s, Anthropic’s, or the company’s own. We cover this in Building trust.

The four worries across the four surfaces

The four shapes are constant. The surface they show up on changes which control you reach for.

Inside most operations, there are four common AI surfaces:

Consumer tools used by employees — ChatGPT, Claude.ai, Gemini, Copilot, whatever else. Lowest visibility, highest informal adoption.
Internal tools your team builds — usually a thin app over a vendor API, sometimes over a self-hosted model.
Self-hosted models — open-weights models running on your own infrastructure.
Vendor products — third-party SaaS where AI is a major part of the value.

Every worry applies to every surface, but the weight shifts. Data exposure is loudest on consumer tools (no contract, no settings) and quietest on self-hosted (the data did not leave the environment, but log hygiene still matters). IP is loudest on customer-facing output, regardless of where the model lives. Bias is loudest wherever the model is shaping a decision about a person — and the surface barely matters. Wrong-answers-acted-on is loudest wherever the model’s output flows directly into an action without a gate.

A useful frame for any AI initiative: pick the surface, then walk the four worries one at a time. The combination tells you which controls are urgent. This pairs naturally with the fit anatomy used in Module 5 — fit names where AI belongs; the four worries name what to watch for once it does.

Two failure modes to avoid

Most AI risk thinking gets stuck in one of two places.

The first is dismissal — “the model is good enough, the team is careful, it’ll be fine.” Usually this holds for a while, then breaks once, badly, in public.

The second is paralysis — “we can’t move until we have a full governance framework.” Usually this means nothing ships for a year, and the team that was meant to be served by the framework has been using consumer chatbots the whole time anyway.

Both miss the same thing. The four worries are well-defined. The controls for each are known. The work is to walk each project through the four shapes and the four surfaces, decide which controls fit, and put them in place. That work, done repeatedly, is more useful than any governance document.

The teams that go furthest with AI are the ones that face the four worries plainly and design for them, again and again, project by project.