A seven-step path for hiring AI agents the same way you’d hire any other production-grade vendor: define the task, write a procurement-grade RFP, evaluate against an independent framework, price-check against the WorkForce Labor Index, pilot, contract, and measure outcomes in production.
Hiring AI agents starts the same way as hiring people: define the task. The difference is the unit of work — per resolution, per PR, per document, per lead, per call. Pin it down up front, because everything downstream (price, SLA, AQO measurement) is denominated in this unit. Map to a published WLI category where one exists so you can benchmark price and quality against the market.
Use a procurement-grade template. The WorkForce RFP template (CC-BY-4.0, free) covers scope, evaluation criteria with explicit weights, AQO requirements, IOSCO-aligned benchmarking, pricing transparency, refund/SLA mechanics, SOC 2 / data-sovereignty, and references. Distribute to 5–8 vendors per category — fewer than 5 weakens price discovery; more than 8 dilutes attention.
Score each response against the seven-point framework: price transparency, independent quality verification (AQO), outcome measurement ownership, methodology disclosure, security and SOC 2, refund policy, and data sovereignty. A vendor that fails any one of these is not ready for production — regardless of how good the demo looks.
For each finalist, compare the quoted per-unit price to the published WLI rate for the category (median + confidence interval). A material discount or premium must be justified by the vendor’s AQO score. A cheap agent with a poor AQO is more expensive per quality-adjusted unit than a premium agent with a strong AQO — the WLI gives you the denominator to do this math.
Pilot one finalist against your real workload — not a vendor-curated demo. Reserve the right to run a sealed holdout AQO eval against the production endpoint. Measure: AQO score, per-unit cost, latency at P95, failure rate, escalation rate, and time-to-resolution. Pilot length should be at least one full operational cycle (a week for high-volume, a month for low-volume tasks).
The contract must include: per-unit price in the unit of work, volume tiers and overage rates, capped annual escalator, SLA with AQO floor and remedy mechanics, refund policy for failed per-unit outputs, SOC 2 / data sovereignty / training-on-customer-data terms, sub-processor change notification, termination right with data-export schedule. Treat the SLA schedule from the RFP template as a contractual exhibit.
Ownership of the measurement infrastructure is yours, not the vendor’s. Track AQO drift, per-unit cost vs the moving WLI rate, failure rate, and escalation rate weekly. Material drift below the AQO floor triggers the remedy mechanic. Material drift above the WLI ceiling triggers a re-RFP — the market has moved.
Hiring AI agents without a published market rate is hiring blind. The WLI publishes transaction-anchored, IOSCO-aligned rates per category with confidence intervals — the same role Kelley Blue Book plays for used cars and SOFR plays for short-term lending. Pair it with the AQO score for a quality-adjusted price per unit.