Seven points every AI agent vendor should be measured against before procurement. Built around independent quality verification (AQO), benchmarked pricing (WLI), and the security and data-sovereignty terms that determine whether a pilot survives legal review.
A vendor that will not publish per-unit price (per resolution, per PR, per document) in the unit of work for the task is not ready for procurement. "Contact sales" tiers, undisclosed pass-through costs, and unbounded annual escalators all disqualify a vendor at this stage. Compare the vendor’s per-unit price against the published WorkForce Labor Index rate for the category to know whether you are paying a market rate, a discount, or a premium.
Vendor-published quality claims are not evidence. Require an independent AQO score from the current eval bank version, or a sealed holdout evaluation run by the buyer during the pilot. If the vendor will not submit to an independent quality measurement, the quality claim should be discarded.
How will outcomes be measured in production — and who owns the measurement infrastructure? A vendor that measures its own quality and reports it to the buyer holds both ends of the contract. The buyer must own (or independently verify) the measurement pipeline for the duration of the agreement.
Does the vendor publish how the agent works, what data it was trained on, how outputs are evaluated, and what failure modes are known? Black-box pitches should be downweighted regardless of demo quality. A vendor that cannot disclose methodology cannot be benchmarked, cannot be audited, and cannot be switched out without re-discovering all of these properties on the next vendor.
Require a current SOC 2 Type II report. ISO 27001 / ISO 27701 where applicable. Sub-processor list with change-notification window. Incident response window in hours. Customer-managed keys, SSO/SAML, RBAC, audit logs. For regulated industries (finance, healthcare, legal), add the relevant vertical attestations.
Output quality belongs in the SLA — as a percentile target, not an average. Specify AQO floor for the contract term and remedy mechanics: service credits, refund, cure window, termination right. Specify whether failed per-unit outputs are charged. A vendor unwilling to put quality into the SLA is selling an experiment.
Where (region, country) is customer data stored, processed, and backed up? Can it be pinned to a single region? What is the default training-on-customer-data posture, and is the override contractually enforceable? How long is data retained, in what form, and how is deletion verified? Data sovereignty is the single most common source of late-stage procurement rejection.
Point 1 (price transparency) is meaningless without a published market rate to compare to. The WLI publishes per-category transaction-anchored rates with confidence intervals. Point 2 (quality verification) requires an independent AQO score, computed against the WorkForce eval bank.