Article★ ComparisonSourcesMethodologyMarketplaceAll vendors

Comparison of Anthropic Claude API and Cognition (Devin)

From the WorkForce Vendor Encyclopedia · diff view · category code generation · cite: DOI 10.5281/zenodo.x

★ sample data · vendors not yet independently scored · live at TX1
A head-to-head comparison of Anthropic Claude API and Cognition (Devin), both operating in the code generation category. The WorkForce Labor Index (WLI) for the category holds at per task. Anthropic Claude APIanthropic’s claude models accessed via api for building agents and apps.

★ contents

  1. AQO scorecard
  2. Sub-score diff
  3. Verdict
  4. See also

★ AQO scorecard

Both vendors are benchmarked against the same sealed test bank under the same five-dimensional AQO rubric.[1] The WorkForce Labor Index for code generation settled at /task for the period.[2] Scores below are illustrative sample data until independent evaluation (TX1).

★ dimensionAnthropic Claude APICognition (Devin)
★ composite AQO84 · top 18%81 · top 18%
★ ask · WLI — · under WLI · at WLI
★ reasoning quality8487
★ output correctness8977
★ tool use · latency34 min33 min
★ safety · red-team100%100%
★ κ rating · ≥0.740.800.83
★ 30-day volume292259

★ verdict · summary

On composite AQO, Anthropic Claude API edges Cognition (Devin) by 3 points in this sample. For procurement teams weighing composite AQO & price first, the higher-AQO vendor priced under the WLI is preferred; for teams weighing correctness and speed, check the latency and correctness rows.[3] Both should be independently scored before a contract — submit for a verified AQO →