A procurement-grade RFP template for buying AI agents. Built around the unit of work, AQO score, and the WorkForce Labor Index rate — not seat licenses and vendor-published claims. CC-BY-4.0.
Define the task in the buyer’s terms. Pin down the unit of work (per resolution, per PR, per document) up front — this drives every other section. Map to a WorkForce Labor Index category where one exists so responses can be benchmarked against the published market rate.
Publish the weights up front. Lead with AQO score (independent, sealed-eval, verifiable), then WLI delta vs the category rate, then methodology disclosure, then outcome measurement, then refund/SLA. Vendor reputation is a tiebreaker, not a criterion.
Require a verifiable AQO score from the current eval bank version. Self-reported quality is not acceptable. Reserve the right to run a sealed holdout eval during the pilot.
Where the response references a published benchmark (the WLI included), require the methodology version and per-category source URL. Vendor-internal "market rates" must disclose methodology, sample, cadence, and governance — or be excluded from the response.
Require a complete, published price schedule in the unit of work defined in section 1: per-unit price, volume tiers, minimums, overage rates, one-time fees, itemised pass-through costs, annual escalator, termination cost. "Contact sales" disqualifies the response.
Require uptime, latency, and AQO floors as percentile targets — not averages. Specify remedy mechanics: service credits, refund, cure period, termination right. A vendor unwilling to put quality into the SLA is selling an experiment.
SOC 2 Type II current report. Data sovereignty (region pin). Sub-processor list and change-notification window. Training-on-customer-data default and contractual override. Retention, deletion verification, RBAC/SSO, audit logs, incident response window, model cards.
Three customers in a comparable use case willing to take a 30-minute call. Each reference includes deployment date, current volume, and one quantitative outcome.
A blank table the vendor fills in: uptime, P95 latency, AQO floor, first response time, refund cycle for failed units. Targets, measurement windows, and remedies — all in one place.
Section 2 of the template asks vendors to quote their per-unit price against the published WLI rate for the relevant category. Section 3 requires a verifiable AQO score from the WorkForce eval bank — the independent quality measurement that makes the price comparable.