Skip to content

The shape of a good AI project

Most failed AI projects fail the same way. Not because the AI was wrong. Because the project shape was wrong — what got built, in what order, at what scope, against what definition of done.

The work comes first, the tech comes second

Section titled “The work comes first, the tech comes second”

Projects that work start from a workflow. Projects that don’t start from the tech. The difference shows up in the first sentence of the kickoff.

“We have this workflow that’s painful — let’s see if AI helps” is a project. It has a baseline. It has a person whose week visibly changes if it succeeds. It has a built-in success criterion. (Finding the right workflow in the first place is the job of scanning the operation and the fit anatomy.)

“Let’s build something with AI” is not a project. It is a hunt for a problem to apply a solution to, and the problems found that way tend to be aspirational. The success criterion gets invented after, which means it gets invented to fit whatever was built.

This isn’t about the language used in the room — it’s about what the work is anchored to. A project anchored to a workflow can answer “did this work?” by looking at the workflow. A project anchored to the tech can only answer it by pointing at the tech.

Before anything gets scoped, the work has to be observed. Not described in a meeting — watched, twice, by whoever is going to design the system.

Where does the person doing the work slow down? Where do they switch tools? Where do they retype something that already exists somewhere else? Where do they make a judgment call that wouldn’t survive being automated?

This step gets skipped a lot, because it feels like delay. It isn’t. AI projects routinely scope the wrong thing because the people scoping have never watched the actual work — they’ve only heard about it. The pain that gets described in meetings is usually not the pain that exists. Twenty minutes of observation regularly prevents two months of building the wrong thing.

The temptation in AI work is to build the full vision before launching anything. The shape that works runs the other way: ship the smallest version that does something useful, then add based on what that version teaches.

The smallest useful version might be:

  • One prompt the team uses manually, before any automation or integration.
  • One workflow automated end-to-end, before adding others.
  • One department, before rolling out company-wide.

Each small version teaches what the full version should look like. AI is unusually punishing of plans made in the abstract — the model behaves in ways that aren’t fully predictable, the workflow has edges the team didn’t articulate, and users do things with the tool that nobody at kickoff would have guessed. A full version designed without those lessons usually misses.

Most of the real value lands after the first version ships

Section titled “Most of the real value lands after the first version ships”

A working-but-imperfect tool the team uses is worth more than a polished tool nobody has touched. The early version is what surfaces what the actual hard problems are, and those problems are almost never the ones predicted in the kickoff meeting. (Measuring AI covers how to read whether the version that shipped is actually working.)

A reasonable rule of thumb: 30-50% of total effort comes after the first version is in users’ hands. That isn’t “we ran over budget” — that is where the real value gets built. The project plan that allocates 100% of effort to pre-launch and treats post-launch as bug-fixing tends to deliver something that technically shipped and operationally didn’t.

A common detour: a company starts out wanting to automate one workflow, and somewhere in scoping, the conversation widens. Why build for one workflow when a configurable platform could handle this one plus a hundred more?

The platform takes six months. Nobody knows what the other ninety-nine workflows will look like, because they don’t yet exist. The first workflow — the one with a real owner and a real baseline — gets badly served by a system designed to serve a hundred. The other ninety-nine never materialise.

The shape that works runs the other direction. Build the one thing. Build it cleanly. If a second workflow eventually arrives that genuinely overlaps with the first, there’s something concrete to extend. If it doesn’t, what got built is still a working thing — not a half-built platform with no committed second tenant.

This timeline keeps appearing in the AI work that lands well inside real businesses. Long enough to do something substantial. Short enough that the model, the team, and the business context haven’t all shifted by ship date.

Plans much longer than 12 weeks tend to be hiding something — usually unstated scope, an undefined success criterion, or a decision that hasn’t actually been made. A “nine-month AI project” often turns out to be a six-week pilot plus an open-ended phase two that nobody has scoped and nobody owns.

Sometimes the answer is genuinely “this needs nine months” — heavy data integration, regulated workflows, deep model customisation. More often, the nine-month version can be re-shaped as a six-week pilot plus a three-month refinement, and the team learns ten times as fast.

Shipped means: in production. In the real stack. Used by the team whose work it serves. Documented well enough that someone other than the original builder can keep it running. Handed over with a working session, not an email.

Not a demo. Not a slide deck. Not “the prototype works on my machine.” Not “we’ve built it; now we just need someone to roll it out.”

The line being crossed is from “we built something” to “the work runs differently now because of what we built.” That is the only definition of shipped that matters. Everything else is in-progress, regardless of what the status report says.