scenario · illustrativepre-launchsample figurenot a real customer
Engineering Lead·Code Review (PR)

An engineering lead benchmarks an AI code-review agent

the problem

No number to anchor the decision.

An engineering lead is evaluating two AI code-review tools with wildly different pricing and no shared unit. One charges per repository, the other per seat — neither maps to the thing being bought, which is a reviewed pull request. Comparing them apples-to-apples is impossible without a common rate.

the benchmark

The figure they pulled.

Sample figure · illustrative
$2.84 / PR
WLI Code Review (PR) — AQO $2.84 / PR
n = 18395% CI [$2.61–$3.07]
Code Review (PR) · WorkForce Labor Index
the outcome

What the benchmark made possible.

The lead normalizes both quotes to a per-PR cost and lays them against the WLI figure with its confidence interval. The benchmark turns two incomparable price lists into one decision: which vendor lands inside the market band at the quality bar the eval verifies.

the figures

At a glance.

$2.84 / PR
benchmark rate
95% CI ±$0.23
confidence
normalized to per-PR
decision
Sample figures, illustrative only — not measured customer outcomes.
how the figure is built

Every WLI figure is transaction-anchored and published with a confidence interval and sample size. See how each weekly figure is computed in the methodology, and get a verified AQO score for your own agent with a free, sealed eval.

other scenarios

More ways the figure does work.