How we measure impact — and what we deliberately don't claim.
BuiltAI's pitch is “audit-first, governance-led.” That extends to what we say about our own impact. This page documents how we baseline, measure and report the numbers that land in proposals, board reports and case studies — including the categories of claim we refuse to make until we have the engagements to back them.
Five principles
How a number gets onto a BuiltAI page.
Principle 01
Baselines first, claims second
Every engagement starts with a Baseline KPI Sheet captured before any BuiltAI workflow lands. We name the metric, name the source, name who owns it, and freeze it. No baseline, no claim — full stop.
Principle 02
Ranges, not single numbers
When we report impact across engagements, we publish the range and the sample size, not just the median. "-42% (n=6, range -18% to -71%)" beats "-42%" every time, because it shows you the variability you should expect.
Principle 03
Per-pack, not blended
We don't blend tender-time savings with margin-recovery and call it "AI productivity." Each workflow pack reports against the metrics that pack actually moves. Mixing them is how vendors hide weak performance behind one strong line.
Principle 04
Owner-confirmed
Every published number is signed off by the operational owner on the client side — not by BuiltAI alone. If the client's commercial lead won't put their name to it, it doesn't go on the website.
Principle 05
Methodology disclosed
If we say "tender response time fell 42%," we tell you what counted as a tender, what counted as response time, what was excluded, and over what period. The methodology travels with the number.
What we actually measure
Per-pack metrics, not blended productivity claims.
Each workflow pack lands against a small set of operational metrics that pack actually moves. We baseline them at engagement start, re-measure on a published cadence, and report the delta — with the methodology in the audit's Findings Pack.
| Pack | Metric | Definition |
|---|---|---|
| Bidroom-in-a-Box | Tender response cycle time | Calendar hours from ITT receipt to submission, measured per tender, anonymised against a 6-tender pre-pack baseline. |
| Bidroom-in-a-Box | Clarification close-rate | Share of ITT requirements with a recorded, evidenced clarification before submission. 100% target; baseline typically <55%. |
| Commercial Control Kit | Recovery cadence | Days from instruction issued to variation logged + evidence linked, monthly cohort. |
| RAMS Factory | RAMS rework rate | Share of submitted RAMS that came back from the H&S approver for rework, against a 90-day pre-pack baseline. |
| Service Desk AI Pack | SLA recovery | P1/P2 ticket SLA attainment, measured weekly, segmented by site cohort against a 30-day pre-pack baseline. |
| Operational Margin Cockpit | Margin movement traceability | Share of monthly margin deltas with a recorded narrative + source pointer. We measure traceability, not the margin number itself — the latter is the client's to disclose or not. |
| Contract Obligations Register | Owner-coverage | Share of contract obligations with a named owner, an evidence requirement, and a defined notice trigger. Target 100%; baseline typically <60%. |
| AI Governance Policy Pack | RED-block enforcement | Share of attempted AI calls against Red-classified data that were blocked at the gate. Target 100%, monitored continuously via the platform's AI usage log. |
What we don't claim
Things you'll see on competitor sites that you won't see here.
Some claims are common in AI vendor marketing because they're easy to print and hard to disprove. We deliberately don't make them. Here's the list, with reasons.
Hours saved per worker. We don't measure individual productivity, and any number we quoted would be guessed.
ROI multiples. ROI depends on how the client uses the recovered time / margin — that's their business decision, not ours to claim on their behalf.
Industry-wide benchmarks. We don't have enough engagements yet to claim the median for FM, M&E, or fabric. We'll publish that when we do, with the n.
Anything from a pilot under 60 days. Pilots in their first 60 days are mobilising — measuring impact then captures the mobilisation curve, not the steady-state benefit.
AI productivity gains in isolation. AI is part of a workflow pack. We measure pack outcomes, not "AI vs human" comparisons that would be misleading.
Want the audit version?
The Operational Intelligence Audit lands a baseline you can defend.
Two-week diagnostic. Six SOP deliverables — including the Baseline KPI Sheet — sized for your business.