Building an AI literacy baseline across your organisation: what to assess and how

Without an AI literacy assessment, you end up making three expensive mistakes

Better People·2026-05-06

Why a baseline matters more than people think

Without an AI literacy assessment, you end up making three expensive mistakes.

First, you train the wrong people on the wrong things. The senior analyst who already uses Copilot daily sits through "intro to prompting" while the operations manager who actually needs it gets nothing. Second, you can't measure progress, because you never measured the starting point. Third, you can't defend the budget. When the CFO asks what the training delivered, "people seemed to enjoy it" is not an answer.

A proper AI capability baseline does three things. It tells you who needs what. It gives you a before-picture so you can prove the after. And it surfaces the political reality of where confidence and competence don't match, which is almost always the real bottleneck.

What "AI literacy" actually means in an enterprise

The term has been stretched beyond usefulness. For our purposes, AI literacy in an enterprise context breaks into four layers, and you need to assess all four.

Conceptual understanding. Does the person understand what generative AI is, what it isn't, and where it fails? Can they explain hallucination, context windows, and the difference between a model and an application? This is the layer most "AI 101" courses stop at, which is why most AI 101 courses are a waste of time on their own.

Practical fluency. Can they actually use the tools their organisation has licensed? Not "have they tried Copilot once," but can they get a useful output for a real work task in under five minutes? This is where most baselines fall apart, because self-reporting is wildly unreliable. People who've watched two YouTube videos rate themselves a 7 out of 10.

Judgement and risk awareness. Can they spot when an AI output is wrong, biased, or risky to use? Do they know what data they shouldn't paste into a public model? Do they understand their organisation's AI policy? This layer is where the risk management work lives, and it's the one most L&D teams skip.

Workflow integration. Can they redesign a piece of their own work around AI, not just use it as a faster typewriter? This is the layer that produces actual productivity gains, and it's the rarest. Most workforces are nowhere near it.

A useful workforce AI readiness assessment scores all four. A weak one scores only the first.

How to actually run the assessment

Here's the structure we use with enterprise clients. Adapt it, don't copy it.

1. Segment before you assess

Don't assess "the workforce." Assess defined cohorts. A reasonable starting cut is: executive leaders, people managers, technical roles (engineering, data, IT), knowledge workers in customer-facing functions, knowledge workers in back-office functions, and frontline operational roles. Each cohort needs different questions because each has different jobs to do with AI.

If you assess everyone the same way, your results will be statistically interesting and operationally useless.

2. Combine three instruments

Self-assessment alone is unreliable. Skills tests alone miss context. Manager input alone reflects bias. Use all three.

A short self-assessment survey (10–15 minutes). Confidence, frequency of use, perceived blockers, tools currently used. This tells you what people think.
A scenario-based skills check (15–25 minutes). Give them a real work artefact, a messy email thread, a draft report, a dataset, and ask them to use AI to do something specific with it. Score the output. This tells you what people can actually do.
Manager-rated capability and need (5 minutes per direct report). Where does the manager see the gap, and what would unlock the team's productivity? This tells you where the business value sits.

The gap between self-assessment and skills check is the most useful data point you'll get. The people who score themselves high but fail the scenario are your overconfident risk. The people who score themselves low but pass are your hidden champions. Both groups need different interventions.

3. Tie it to real work

Generic assessments produce generic results. The scenario-based portion should use artefacts and tasks from the actual cohort's work. For finance, give them a budget variance to analyse. For HR, a candidate shortlist exercise. For operations, an incident report to summarise. The closer the assessment is to real work, the more honest the results.

This is also where most off-the-shelf assessment platforms fall down. They test generic prompting against generic tasks, which tells you very little about whether your people can do their jobs better. We've written more about this trade-off in off-the-shelf vs custom training.

4. Build a heatmap, not a leaderboard

The output of the assessment should be a heatmap by cohort and by capability layer. Not individual scores ranked against peers. The point is to design training, not to performance-manage people on a skill they were never taught.

A good heatmap shows you, at a glance: where conceptual understanding is solid but practical fluency is missing (most common), where practical fluency is high but judgement is weak (the dangerous quadrant), and where workflow integration is starting to appear organically (your internal champions, who you should resource).

What to do with the results

This is where most AI skill gap assessments die. The data gets collected, a deck gets presented, and nothing changes.

The baseline should drive three decisions. Curriculum design: what each cohort actually needs, not what the vendor's catalogue happens to contain. Sequencing: who goes first, based on where the business value is highest and the gap is most addressable. Success metrics: the same instruments, run again at six and twelve months, with the delta as your proof of impact.

If you're commissioning a vendor to design the program off the back of the assessment, ask them how they'll use the data. If they can't answer beyond "we'll tailor the content," they're not actually going to. The questions to ask before you sign are covered in how to choose an AI training provider.

A note on speed

A full baseline across a workforce of 5,000 people sounds like a six-month project. It isn't. Done well, it takes four to six weeks: two weeks to design the instruments around your cohorts, two weeks of fieldwork, one to two weeks of analysis. The mistake is treating it as a discovery exercise rather than a design input. You are not trying to understand AI. You are trying to understand your people, so the training you commission actually moves them.

The organisations getting real results from AI training in 2025 didn't start with a vendor. They started with a baseline. Then they built backwards from the gaps that mattered.

Ready to talk?

30-minute discovery call.

Book a 30-minute discovery call →