Where your data goes

A prompt is not just a question. It is a piece of data. Sometimes it contains a customer name, sometimes a contract clause, sometimes a number that hasn’t been reported externally yet. The moment that data leaves the boundary of the business environment, the normal data-governance questions apply — where does it travel, who stores it, how long, who else gets to see it.

This chapter walks through what those answers look like across the surfaces AI typically lives on inside an operation. The illustration below shows one common path — a user sending a prompt through a vendor’s UI to the model and on to whatever destinations the vendor uses. That is a single instance of a broader pattern.

The same prompt could take a very different path depending on the surface.

Four surfaces, four default paths

Path 1 — Consumer chatbot

An employee types into ChatGPT, Claude.ai, Gemini, Copilot, or similar.

The prompt leaves the laptop, travels to the vendor’s servers, the model runs, the answer returns. So far this is the round trip in the illustration. What the illustration doesn’t show is what happens to the prompt afterwards.

On a free or basic consumer plan, the prompt may be retained, may be used to improve future models, may be visible to vendor staff if a safety system flags it. On a paid consumer tier, training-on-your-conversations is usually off by default, but logs still sit on vendor servers for 30 to 90 days for abuse monitoring. On an enterprise plan (ChatGPT Enterprise, Claude for Work, Gemini Business), training is off, retention can usually be tightened, access logs are available, and the contract changes the whole posture.

Three things the user controls almost nothing about: what the vendor’s sub-processors are, what jurisdiction the data is in, what abuse-monitoring access looks like internally. Three things the user controls partially: plan tier, training opt-out toggles, conversation history settings. The default state of most accounts is “prompt is being used and retained more than you think.”

Path 2 — Internal tool built on an API

A team built a thin app over an API — most commonly OpenAI’s, Anthropic’s, Google’s, or one of the inference providers (Together, Groq, Bedrock, Vertex).

The prompt path looks similar — laptop, network, vendor — but the defaults are different. API traffic, on most major providers, is not used for training by default, even on basic plans. Retention is typically shorter. The contract signed when an API key was issued usually has stronger data terms than a consumer-app sign-up.

Three things that change the path significantly. What the wrapper logs. The application sits in front of the model. It may be writing every prompt and response to its own database, its own observability stack, a third-party analytics product. Those logs are inside the perimeter, but they are still data that needs the same hygiene as any sensitive production data — access control, retention, encryption, deletion on request. Which region. Most major API providers let a customer pin inference to a specific region (US, EU, regional clouds). Whether that was actually done when the integration was set up is a question with a real answer. What the model can reach. If the app has tools — retrieval over your data, web access, integrations to a CRM — each of those is its own data path. More on that below.

Path 3 — Self-hosted model

The team is running an open-weights model — Llama, Mistral, Qwen, DeepSeek — on infrastructure the business owns. Its own GPUs, its own cloud account, its own VPC.

The good news: the prompt and response never leave the environment. The data-exposure-to-an-external-vendor question is mostly answered.

The harder news: data-exposure-inside-the-environment is now a question the business owns end-to-end. Which servers handle the inference? Who has SSH access to them? Are prompts being written to a log file somewhere? Where is that log rotated, encrypted, retained? Who can read it? Does the inference engine emit telemetry to a third party (the answer for several popular engines is “yes, by default, until it is turned off”)? If the model is wrapped in a UI, does that UI log conversations to a database, and what is the access pattern on that database?

These are not novel questions — they are the questions any sensitive internal application gets asked. But they tend to fall through the cracks because the conversation around self-hosting is dominated by “the data doesn’t leave” and people stop thinking past that point.

Path 4 — Vendor product where AI is the value

A third-party SaaS that has AI baked in. A note-taker on every meeting. A customer-support assistant. A sales-research tool. A coding tool that reads the company repo.

This path has the most hops. The prompt and the underlying data go to the vendor’s servers, where the vendor processes them. Most vendors of this shape do not run their own models — they call an upstream API (OpenAI, Anthropic, Bedrock). The data flow then becomes: customer data → vendor servers → upstream model provider → vendor servers → back to the customer. Both companies in the chain have their own terms, their own retention, their own staff with access.

Layered on top: vendor-specific features that touch other systems. A meeting note-taker that reads your calendar and writes to your CRM. A customer-support assistant that reads your ticket history and your knowledge base. A coding tool that reads your private repository. Each integration is a separate data path, and the vendor has to be trusted with each.

The three knobs that show up on every path

Across all four surfaces, the same three variables decide how exposed a given prompt actually is.

Plan or tier. On a consumer product, the difference between free, paid consumer, and enterprise is the biggest single lever. On an API, the equivalent is whether the account is on the standard endpoint, a region-pinned endpoint, or a zero-retention endpoint. On a vendor product, it is the difference between the SMB plan and the enterprise plan with DPA, sub-processor list, and data-residency commitments. Self-hosted has no plan tier, but it has the closest analog in network egress — the difference between a model that can call the internet and one that can’t.

Endpoint. Web UI versus API is the cleanest version of this. The UI carries features — conversation history, memory, plugins, integrations — that each add a data path. The API is more bare-bones, and usually has stricter default privacy. For a self-hosted model, the equivalent is whether inference is exposed only inside the VPC or also through a public endpoint.

Retention and training settings. Even on a single plan and endpoint, the toggles still matter. Training opt-out. Conversation history off. Memory off. Zero retention on certain enterprise tiers. For a self-hosted model, the equivalent is the company’s own log policy: how long prompt-and-response logs are kept, who can read them, when they are deleted.

For any AI surface, getting these three variables right is most of the data-exposure work.

Where it gets murky — the seams between systems

Whichever surface the AI lives on, the riskiest data-exposure incidents usually happen at the seams, not in the central model call.

Wrappers. A company that builds “AI for X” on top of OpenAI or Anthropic is a separate entity with its own data practices. Their UI is the user’s front door; their database holds the data; they forward to the underlying model. Two vendors in the chain, each with their own logs, retention, staff access, sub-processors. Wrappers can be excellent products. They are also harder to audit than a direct provider, because the audit is now of two companies in series.

Plugins, tools, and integrations. When an AI tool reads your Google Drive, queries your CRM, or makes web requests on behalf of the user, each of those touches creates a new data path. The model’s privacy policy may be clean; the plugin maker’s may not. A code AI that reads your private repo is, for purposes of data exposure, also a code-reading vendor — separate question from “does the model train on the code?”

Voice and screen. Voice agents transcribe speech to text using a transcription service before the model sees it. Screen-sharing agents capture and process screenshots. Each medium adds a hop, with its own retention and processing terms. The transcription provider is often not the same company as the model provider, even when the product is sold as one thing.

Embeddings and vector stores. Retrieval-augmented systems (the pattern from tools and memory) turn the company’s documents into embeddings and store them somewhere. The embeddings are a derivative of the original content. The store is itself a data system that needs the same hygiene as any database holding sensitive content. The vector store is easy to forget — it sits behind the model in the architecture diagram, but it is a separate system with its own access pattern and retention.

A useful rule: every external system involved in an AI interaction is a separate data path. Audit each.

Self-hosting — when it is actually the answer

If the data must not leave the environment under any circumstances, the model has to come to the data.

Open-weights models — Llama, Mistral, Qwen, DeepSeek — run on infrastructure the business controls. Prompts never leave the boundary. For sectors where data residency is legally non-negotiable (defence, certain regulated parts of finance and healthcare, certain government work), this is the only viable path.

The tradeoffs are real and worth naming plainly. Open-weights models lag the best closed models by six to twelve months on most benchmarks. Running them well takes real operational investment — GPUs, inference engineering, monitoring, the ML platform team to keep it running. The gap between “an open model from last year, running fine” and “what a team can actually get into reliable production today” is non-trivial.

For most businesses, an enterprise plan with the right settings is a far cheaper, almost-as-private alternative. Self-hosting is a serious commitment, not a default — and it does not remove the data-flow questions, it just moves them inside the perimeter.

What this means for your business

The same data — a customer email, a contract draft, a forecast number — can take four very different paths depending on which AI surface it touches. Each path has its own questions worth answering. Most teams underweight this early, partly because the team typing into the free tier is not the team that owns the data policy.

That gap is usually fine while the work is experimentation with anonymous, low-stakes inputs. The moment real work moves through AI tools — customer data, internal numbers, anything that would matter in a leak — the data-flow questions catch up regardless of whether anyone has answered them.

What’s actually happening inside most operations is some combination of all four paths at once. An analyst is in ChatGPT. An engineer is shipping a tool on the API. A team has spun up an open-weights model on Hugging Face’s inference endpoint or an internal cluster. A vendor product was bought last quarter and nobody has re-read the DPA. The question is not which path is in use — it is which paths are in use, what each one’s defaults look like, and which of the three knobs are set the way they were actually meant to be.