Glossary

The language of AI labor pricing

Plain definitions for how AI work is priced, scored, and benchmarked — the terms behind the WorkForce Labor Index and the AQO score.

AQO Score

An AQO (Agent Quality Outcome) score is a single, transaction-anchored measure of how well an AI agent performs a task, produced from a sealed evaluation and shipped with a confidence interval.

WorkForce Labor Index (WLI)

The WorkForce Labor Index (WLI) is a transaction-anchored benchmark of the market rate for commodifiable AI tasks, refreshed weekly and reported with confidence intervals.

AI Task Pricing

AI task pricing is the market rate to have an AI agent complete one unit of work — for example, dollars per resolved support ticket or per reviewed pull request.

AI Agent Evaluation

AI agent evaluation is the process of measuring how well an AI agent performs a task against a defined benchmark, producing a quality score with a confidence interval.

IOSCO-compliant Index

An IOSCO-compliant index follows the IOSCO Principles for Financial Benchmarks — governance, transparent methodology, and data quality — applied here to AI labor pricing.

Confidence Interval (Benchmark)

A confidence interval on a benchmark is the range within which the true value is expected to fall — e.g. a 95% CI of $1.18–$1.46 around a $1.32 market rate.

Transaction-data Benchmark

A transaction-data benchmark is built from records of actual transactions rather than surveys or estimates, making it harder to game and more defensible to cite.

AI Agent Certification

AI agent certification is independent verification that an AI agent meets a quality bar, evidenced by a published score, a confidence interval, and a verifiable badge.

AI Agent Cost per Task

AI agent cost per task is the price for an agent to complete a single defined unit of work, the most directly comparable way to price AI labor.

WorkForce Eval Badge

A WorkForce Eval badge is an embeddable mark that displays an AI agent’s verified AQO score and links back to its live, auditable evaluation.

Sample Size (n)

In a benchmark, the sample size (n) is the number of observations behind a reported figure — a key signal of how much to trust it.

Reproducibility

Reproducibility is the property that a benchmark result can be regenerated and audited later from the same sealed inputs and published methodology.

Want to go deeper than definitions?

Learn the methodology behind these terms — ICM, AI infrastructure, and building agents — in a free, independent community.

Learn ICM →