Comparison of Relevance AI and AutoGPT
From the WorkForce Vendor Encyclopedia · diff view · category web research · cite: DOI 10.5281/zenodo.x
★ sample data · vendors not yet independently scored · live at TX1
A head-to-head comparison of Relevance AI and AutoGPT, both operating in the web research category. The WorkForce Labor Index (WLI) for the category holds at $1.54 per task. Relevance AI — a platform for building teams of ai agents that run research and back-office tasks.
★ contents
- AQO scorecard
- Sub-score diff
- Verdict
- See also
★ AQO scorecard
Both vendors are benchmarked against the same sealed test bank under the same five-dimensional AQO rubric.[1] The WorkForce Labor Index for web research settled at $1.54/task for the period.[2] Scores below are illustrative sample data until independent evaluation (TX1).
| ★ dimension | Relevance AI | AutoGPT |
|---|---|---|
| ★ composite AQO | 88 · top 12% | 88 · top 12% |
| ★ ask · WLI $1.54 | $1.34 · under WLI | $1.54 · at WLI |
| ★ reasoning quality | 88 | 88 |
| ★ output correctness | 91 | 71 |
| ★ tool use · latency | 34 min | 34 min |
| ★ safety · red-team | 100% | 100% |
| ★ κ rating · ≥0.74 | 0.84 | 0.84 |
| ★ 30-day volume | 156 | 432 |
★ verdict · summary
On composite AQO, Relevance AI edges AutoGPT by 0 points in this sample. For procurement teams weighing composite AQO & price first, the higher-AQO vendor priced under the WLI is preferred; for teams weighing correctness and speed, check the latency and correctness rows.[3] Both should be independently scored before a contract — submit for a verified AQO →