AI Implementation and Adoption: Why Most Enterprise Projects Stall Between Pilot and Production

The numbers back this up. RAND researchers found that over 80% of AI projects fail, roughly twice the rate of non-AI tech projects

Ijan Kruizinga·

The pilot graveyard is real

The numbers back this up. RAND researchers found that over 80% of AI projects fail, roughly twice the rate of non-AI tech projects. IBM's Institute for Business Value reports that only 25% of AI initiatives deliver the expected ROI, and just 16% have scaled enterprise-wide. BCG's research on AI value creation puts it more bluntly: 74% of companies struggle to achieve and scale value from their AI investments.

You can argue with the exact percentages. You cannot argue with the pattern. Most enterprises have a pile of pilots and very little production.

The interesting question is why. Because it isn't, mostly, a technology problem. The models work. The cloud platforms are mature. The vendor ecosystem is rich. If you give a competent engineer a clear problem and access to a frontier model, they can build something useful in a week.

The failure mode is somewhere else.

What actually kills enterprise AI initiatives

After working with dozens of large organisations on AI rollouts, the same patterns show up again and again. Here's the honest list.

The pilot was never designed to scale. Someone picked a problem because it was tractable for a hackathon, not because it mattered to the business. The demo worked. Then the team tried to put it into production and discovered that the data wasn't governed, the workflow wasn't documented, the users weren't consulted, and the ROI case was vibes.

Nobody owned the change. A pilot is a technical artefact. A production system is an operational change. Different muscle. The team that built the pilot didn't have authority to redesign the workflow, retrain the staff, or rewrite the policies. The project sat in limbo waiting for someone to claim it.

The workforce wasn't ready. This is the one I see most often. The tool got deployed. The training was a 30-minute video. Six weeks later, license utilisation was at 12%, the people who did use it were prompting it like Google, and the productivity case evaporated. The technology worked. The humans never caught up.

Governance was an afterthought. Risk and legal got involved at the eleventh hour, found things they didn't like, and either killed the project or dragged it into a six-month review. Avoidable. Predictable. Constantly happening.

The business case was theatre. "AI will save us 30% on customer service costs." Says who? Based on what baseline? Measured how? Most enterprise AI business cases would not survive five minutes of scrutiny from a competent CFO. And eventually they get that scrutiny.

If you read that list and recognised your own organisation in three or four of them, you're in good company. The fix is not another pilot. The fix is a different way of thinking about AI implementation from the start.

A three-phase model: assess, pilot, scale

The most successful AI rollouts we've supported have a shape in common. They run in three deliberate phases, with clear gates between them, and they don't move forward until the previous phase has earned the right.

The shape looks like this:

  1. Assess. Understand readiness, pick the right use case, line up the people and the governance.

  2. Pilot. Build the smallest version that proves the value, with production in mind from day one.

  3. Scale. Roll out properly, with the workforce, the governance, and the measurement to make it stick.

Each phase has a different question it's answering. Skip the question and you'll pay for it later.

Phase 1: Assess

The assess phase is where most projects make their fatal mistake by skipping it.

A proper AI readiness assessment is not a vendor questionnaire. It's an honest look at four things:

Data. Is the data the AI needs to work with actually accessible, accurate, and governed? In one recent engagement with a large insurer, the proposed use case died in the assessment because the underlying claims data lived in three systems with conflicting field definitions. Better to know that in week two than month six.

Workflow. What does the work actually look like today? Who does it, in what order, with what handoffs? You'd be amazed how often the answer to this question reveals that the AI use case as scoped doesn't match the work as performed.

People. Who is going to use this thing, and what's their current capability? Are they comfortable working with probabilistic tools, or have they spent 20 years in deterministic systems where the answer is either right or wrong? This determines whether you need a workshop, a deep program, or a structural change to roles.

Governance. What are the risk, legal, security, and ethical constraints, and have the relevant people been involved from the start? If your AI use case touches customer data, regulated decisions, or anything safety-critical, governance isn't optional and it isn't last.

The output of the assess phase is not a slide deck. It's a decision: which use case to pilot, what success looks like, who owns it, and what the path to production is if the pilot works. If you can't answer those four questions clearly at the end of the assess phase, do not move to pilot. You will regret it.

For organisations with significant regulatory exposure, this is also the right moment to read up on AI risk management and bring those frameworks into the assessment from the start, not bolt them on later.

Phase 2: Pilot

A good pilot is not a science experiment. It's a small, real version of the production system, run with real users on real work.

The mistakes here are predictable.

Pilots that aren't real. Synthetic data, hand-picked examples, friendly users. The pilot succeeds and tells you nothing about whether the production version will work.

Pilots that have no exit. No one decided in advance what success looks like, so the pilot just keeps running. Six months in, it's neither succeeded nor failed. It's just there.

Pilots that are too big. "Let's pilot it across the whole claims team." That's not a pilot, that's a production rollout with a different name. You'll be too committed to kill it if it doesn't work.

A useful pilot has four properties. It uses real data and real users. It has a clear hypothesis ("this will reduce average handle time by 15%, measured this way, over this period"). It has a defined exit ("at the end of eight weeks, we either commit to scale, kill it, or extend with these specific changes"). And it's designed with production constraints already in mind, including security, integration, and governance.

This is also where the workforce question shows up for the first time. Even at pilot scale, the users need to know how to actually use the thing. A pilot where the users haven't been trained doesn't tell you whether the technology works. It tells you whether untrained users can figure out the technology, which is a different and less useful question. A short, targeted AI workshop for business at pilot stage often pays for itself many times over by separating signal from noise in the results.

The output of the pilot phase is also a decision: scale, kill, or iterate. If your pilot doesn't end with one of those three, you ran it wrong.

Phase 3: Scale

This is where most enterprises discover that the hard part wasn't building the pilot. It was everything else.

Scaling an AI capability across an enterprise involves at least five workstreams running in parallel:

Technical. Production engineering, integration, monitoring, model governance, cost control. The pilot probably ran on someone's laptop or a sandbox. Production needs to run reliably, securely, and affordably at scale.

Workforce. This is the workstream most often underfunded. People need to know what the tool does, what it doesn't do, when to trust it, when not to, and how to use it well. Generic vendor training rarely cuts it. The difference between off-the-shelf vs custom AI training becomes very real here, because the workflow you're embedding the tool into is yours, not the vendor's.

Governance. The pilot might have run under an exception. Production needs the policy, the controls, the audit trail, and the ongoing review to be real. This is also where regulators, internal audit, and the board start paying attention.

Change. Roles will shift. Some work will disappear. Some new work will appear. People will be anxious. Communication, leadership engagement, and structural change need to be planned and resourced. AI rollouts that ignore the change management dimension consistently underperform.

Measurement. The business case made at the start of the project needs to be tracked against actual outcomes, not just inputs. License count, completion rate, and user satisfaction are inputs. The actual business metric, whatever it is, is the output. If you don't measure the output, you cannot tell if the rollout worked.

Done well, the scale phase is where the value finally shows up. Done badly, it's where the pilot success quietly evaporates and the next executive asks why we spent the money.

The workforce dimension nobody plans for

I want to spend a moment on the workforce piece, because it's the thing organisations most consistently underestimate, and it's the thing we see most often.

When a large organisation rolls out Microsoft Copilot or Google Gemini, the typical training plan is something like: a launch email, a 30-minute video, an FAQ on the intranet, maybe a lunch-and-learn. Six weeks later, the leadership team looks at utilisation data and sees that 80% of licences are barely being used, and the people who are using them are getting modest productivity gains at best.

This is not a tool problem. The tool is fine. It's a capability problem. Knowledge workers who have spent decades in deterministic, search-based, deterministic-output computing don't automatically know how to work with probabilistic, generative, conversational tools. The mental model is different. The skills are different. The workflow is different.

Enterprise AI training in Australia is the difference between a pilot that proves the concept and a rollout that delivers the result. Without it, you've bought capacity you can't use. With it, the technology investment actually shows up in the business metric.

When organisations are choosing partners for this work, the criteria matter. Generic vendor curriculum rarely fits a specific enterprise context. The work of choosing the right AI training provider is its own decision worth taking seriously.

What good looks like

I'll close with what a well-run enterprise AI implementation actually looks like in practice.

The assess phase takes four to six weeks. It's owned by a single accountable executive, usually from the business unit that will operate the production system, not from IT. It produces a clear use case, a defined success metric tied to a business outcome, a workforce capability plan, and a governance framework that the relevant risk and legal teams have signed off in principle.

The pilot phase takes six to twelve weeks. It runs with real users on real work, with a defined hypothesis and a defined exit. The users are trained for the pilot, not just the production rollout. The technical build is done with production architecture in mind from day one, not refactored later. At the end, there's a clear go, no-go, or iterate decision.

The scale phase runs over six to twelve months, depending on the size of the rollout. It has five workstreams (technical, workforce, governance, change, measurement) that are resourced and led by named owners. The business case is tracked monthly. The workforce capability is built deliberately, through programs designed for the specific workflows people actually do, not generic vendor decks.

At the end, the organisation has a capability that works, used by people who know how to use it, governed properly, measured against a real metric, and improving over time. That's what success looks like. It is not glamorous. It is not fast. It is also not common.

The good news is that the organisations that get this right are pulling away from the ones that don't. The gap between the AI-fluent enterprise and the AI-curious enterprise is widening, and it widens fastest in the scale phase, where the capability gets locked in or quietly lost.

If your organisation has a pile of pilots and a thin track record of production, the question isn't which model to try next. It's whether your assess, pilot, and scale phases are designed for the work you actually need to do. Most aren't. That's the place to start.

If you're working through what your own implementation path looks like, get in touch. We'll tell you honestly what we'd do in your situation, including when the answer is that training isn't the lever you need.

Ijan Kruizinga

Co-founder of Better People. 20+ years across technology and marketing leadership. Previously CEO of Crucial, CEO/COO of OMG and Jaywing.

Ready to talk?

30-minute discovery call.