Comparison of OpenAI Assistants and Cognition (Devin)
From the WorkForce Vendor Encyclopedia · diff view · category code generation · cite: DOI 10.5281/zenodo.x
★ sample data · vendors not yet independently scored · live at TX1
A head-to-head comparison of OpenAI Assistants and Cognition (Devin), both operating in the code generation category. The WorkForce Labor Index (WLI) for the category holds at — per task. OpenAI Assistants — openai’s api for building assistants with tools, memory, and retrieval.
★ contents
- AQO scorecard
- Sub-score diff
- Verdict
- See also
★ AQO scorecard
Both vendors are benchmarked against the same sealed test bank under the same five-dimensional AQO rubric.[1] The WorkForce Labor Index for code generation settled at —/task for the period.[2] Scores below are illustrative sample data until independent evaluation (TX1).
| ★ dimension | OpenAI Assistants | Cognition (Devin) |
|---|---|---|
| ★ composite AQO | 82 · top 18% | 81 · top 18% |
| ★ ask · WLI — | — · under WLI | — · at WLI |
| ★ reasoning quality | 88 | 87 |
| ★ output correctness | 90 | 77 |
| ★ tool use · latency | 22 min | 33 min |
| ★ safety · red-team | 100% | 100% |
| ★ κ rating · ≥0.74 | 0.84 | 0.83 |
| ★ 30-day volume | 464 | 259 |
★ verdict · summary
On composite AQO, OpenAI Assistants edges Cognition (Devin) by 1 points in this sample. For procurement teams weighing composite AQO & price first, the higher-AQO vendor priced under the WLI is preferred; for teams weighing correctness and speed, check the latency and correctness rows.[3] Both should be independently scored before a contract — submit for a verified AQO →