Methodology

How we measure impact — and what we deliberately don't claim.

BuiltAI's pitch is “audit-first, governance-led.” That extends to what we say about our own impact. This page documents how we baseline, measure and report the numbers that land in proposals, board reports and case studies — including the categories of claim we refuse to make until we have the engagements to back them.

Five principles

How a number gets onto a BuiltAI page.

  1. Principle 01

    Baselines first, claims second

    Every engagement starts with a Baseline KPI Sheet captured before any BuiltAI workflow lands. We name the metric, name the source, name who owns it, and freeze it. No baseline, no claim — full stop.

  2. Principle 02

    Ranges, not single numbers

    When we report impact across engagements, we publish the range and the sample size, not just the median. "-42% (n=6, range -18% to -71%)" beats "-42%" every time, because it shows you the variability you should expect.

  3. Principle 03

    Per-pack, not blended

    We don't blend tender-time savings with margin-recovery and call it "AI productivity." Each workflow pack reports against the metrics that pack actually moves. Mixing them is how vendors hide weak performance behind one strong line.

  4. Principle 04

    Owner-confirmed

    Every published number is signed off by the operational owner on the client side — not by BuiltAI alone. If the client's commercial lead won't put their name to it, it doesn't go on the website.

  5. Principle 05

    Methodology disclosed

    If we say "tender response time fell 42%," we tell you what counted as a tender, what counted as response time, what was excluded, and over what period. The methodology travels with the number.

What we actually measure

Per-pack metrics, not blended productivity claims.

Each workflow pack lands against a small set of operational metrics that pack actually moves. We baseline them at engagement start, re-measure on a published cadence, and report the delta — with the methodology in the audit's Findings Pack.

PackMetricDefinition
Bidroom-in-a-BoxTender response cycle timeCalendar hours from ITT receipt to submission, measured per tender, anonymised against a 6-tender pre-pack baseline.
Bidroom-in-a-BoxClarification close-rateShare of ITT requirements with a recorded, evidenced clarification before submission. 100% target; baseline typically <55%.
Commercial Control KitRecovery cadenceDays from instruction issued to variation logged + evidence linked, monthly cohort.
RAMS FactoryRAMS rework rateShare of submitted RAMS that came back from the H&S approver for rework, against a 90-day pre-pack baseline.
Service Desk AI PackSLA recoveryP1/P2 ticket SLA attainment, measured weekly, segmented by site cohort against a 30-day pre-pack baseline.
Operational Margin CockpitMargin movement traceabilityShare of monthly margin deltas with a recorded narrative + source pointer. We measure traceability, not the margin number itself — the latter is the client's to disclose or not.
Contract Obligations RegisterOwner-coverageShare of contract obligations with a named owner, an evidence requirement, and a defined notice trigger. Target 100%; baseline typically <60%.
AI Governance Policy PackRED-block enforcementShare of attempted AI calls against Red-classified data that were blocked at the gate. Target 100%, monitored continuously via the platform's AI usage log.

What we don't claim

Things you'll see on competitor sites that you won't see here.

Some claims are common in AI vendor marketing because they're easy to print and hard to disprove. We deliberately don't make them. Here's the list, with reasons.

  • Hours saved per worker. We don't measure individual productivity, and any number we quoted would be guessed.

  • ROI multiples. ROI depends on how the client uses the recovered time / margin — that's their business decision, not ours to claim on their behalf.

  • Industry-wide benchmarks. We don't have enough engagements yet to claim the median for FM, M&E, or fabric. We'll publish that when we do, with the n.

  • Anything from a pilot under 60 days. Pilots in their first 60 days are mobilising — measuring impact then captures the mobilisation curve, not the steady-state benefit.

  • AI productivity gains in isolation. AI is part of a workflow pack. We measure pack outcomes, not "AI vs human" comparisons that would be misleading.

Want the audit version?

The Operational Intelligence Audit lands a baseline you can defend.

Two-week diagnostic. Six SOP deliverables — including the Baseline KPI Sheet — sized for your business.

Book a Discovery Audit