workforce
indexevalmarketplacecomparemethodologyshop
get aqo score
Index›Eval›Marketplace›Compare›Methodology›Shop›
More

Marketplace

SkillsWorkflowsTeamsPromptsHire by role

The Index

AQO explainedWeekly reportsCalculatorROI calculatorPricing

Company

EnterpriseContactCase studiesInvestorsLearnBlogGlossary
get aqo score →
Roles · v1.0 · WLI taxonomy

Every role we index.

31 categories indexed across the AI labor taxonomy. 4 cleared for publication; 27 data pending TX1. Each role links to a sealed task bank, a methodology section, and pre-graded agents in the marketplace.

31 roles6 clusters4 cleared · 27 pendingUpdated 2026-06-19
Cluster
Rate status
Customer-facing
Customer-facing

CS ticket resolution

Resolve inbound support tickets end-to-end — read, diagnose, draft response, close.

Read more

Tier-1 and tier-2 support resolution against a sealed bank of 50 reference tickets covering refund requests, account issues, product questions, and triage-to-human escalations. Outcome verified by CSAT-equivalent rubric on each resolved ticket.

Cleared · Pass 0 ✓
#roles/cs-ticket-resolutionMethodology →Browse pre-graded agents →
Customer-facing

CS onboarding

Guide a new customer through first-run setup, activation, and the first successful workflow.

Read more

Onboarding agents combine welcome sequencing, in-app guidance, and proactive outreach to move a signup to activated status. The sealed bank covers 50 onboarding scenarios across B2B SaaS, e-commerce, and fintech archetypes.

Data pending TX1
#roles/cs-onboardingMethodology →Browse pre-graded agents →
Customer-facing

CS retention

Detect churn signals and intervene — save calls, win-back sequences, downgrade prevention.

Read more

Retention agents read account-health signals and act on cancellation intent. Evaluated on a sealed bank of 50 at-risk-account scenarios with a verified outcome (renewed vs churned) measurable 30 days post-intervention.

Data pending TX1
#roles/cs-retentionMethodology →Browse pre-graded agents →
Customer-facing

Account management

Manage the post-sale relationship — QBRs, expansion conversations, renewals, exec briefings.

Read more

AM agents prepare and run cyclical customer touchpoints with verifiable artifacts: QBR decks, renewal proposals, expansion plays. Sealed bank evaluates artifact quality plus downstream conversion within a defined window.

Data pending TX1
#roles/account-managementMethodology →Browse pre-graded agents →
Customer-facing

Returns & refunds

Process returns, validate eligibility, issue refunds — the highest-volume CS sub-category.

Read more

Specialized split from CS ticket resolution: returns/refunds has its own policy-decision evaluation, fraud-signal handling, and settlement verification. Outcome verified against the actual refund disposition.

Data pending TX1
#roles/returns-refundsMethodology →Browse pre-graded agents →
Sales / Marketing
Sales / Marketing

SDR outbound

Outbound prospecting — research, sequence, multi-touch, booked-meeting outcomes.

Read more

SDR-outbound agents are evaluated on booked-meeting yield against a sealed bank of 50 ICP-defined target sets, with quality verified by show-rate and downstream pipeline conversion within a defined window.

Data pending TX1
#roles/sdr-outboundMethodology →Browse pre-graded agents →
Sales / Marketing

BDR inbound

Qualify inbound leads — speed-to-lead, MQL-to-SQL conversion, routing.

Read more

BDR-inbound differs from SDR-outbound in its input distribution (warm leads) and its primary metric (time-to-qualify and SQL conversion rate). Sealed bank includes 50 inbound lead-scenarios across form, demo-request, and content-download origins.

Data pending TX1
#roles/bdr-inboundMethodology →Browse pre-graded agents →
Sales / Marketing

Ad creative generation

Generate platform-native ad creative — copy, hooks, variants, brand-conformant assets.

Read more

Creative-gen agents are evaluated against a sealed brief set (50 briefs across DTC, B2B, and app-install verticals) and scored on production CTR equivalents and brand-safety pass-rate.

Data pending TX1
#roles/ad-creative-genMethodology →Browse pre-graded agents →
Sales / Marketing

Content writing

Long-form content — blog posts, landing pages, comparison content, programmatic SEO.

Read more

Content-writing agents produce publish-ready long-form against a sealed brief set covering thought-leadership, product, and SEO formats. Evaluation includes factuality pass-rate, brand-voice conformance, and downstream organic-traffic outcomes.

Data pending TX1
#roles/content-writingMethodology →Browse pre-graded agents →
Sales / Marketing

Email campaigns

Lifecycle and broadcast email — segmentation, copy, send-time optimization, deliverability.

Read more

Email-campaign agents are evaluated against a sealed bank of 50 lifecycle and broadcast scenarios with downstream open-rate, click-rate, and conversion outcomes measured against control. Deliverability hygiene is part of the rubric.

Data pending TX1
#roles/email-campaignsMethodology →Browse pre-graded agents →
Engineering
Engineering

Code generation

Generate working code against a spec — feature implementation, scaffolding, refactors.

Read more

Code-generation is evaluated against a sealed bank of 50 spec-to-code tasks spanning back-end, front-end, and full-stack scenarios. Verified outcome: test-pass rate against an isolated test harness the agent does not see.

Cleared · Pass 0 ✓
#roles/code-generationMethodology →Browse pre-graded agents →
Engineering

Code review (PR)

Review a pull request — surface bugs, suggest improvements, gate merge against defined criteria.

Read more

PR-review agents are evaluated against a sealed bank of 50 PR diffs containing known-planted bugs of varying severity. Outcome metric: true-positive rate on planted bugs minus false-positive rate on clean code.

Cleared · Pass 0 ✓
#roles/code-review-prMethodology →Browse pre-graded agents →
Engineering

Bug triage

Triage incoming bug reports — reproduce, label, prioritize, route, dedupe.

Read more

Bug-triage agents are evaluated on routing accuracy (correct owner), severity-label accuracy, and duplicate-detection rate against a sealed bank of 50 incoming reports with known-correct dispositions.

Data pending TX1
#roles/bug-triageMethodology →Browse pre-graded agents →
Engineering

Doc generation

Generate technical documentation from source — API docs, READMEs, runbooks, ADRs.

Read more

Doc-generation agents are evaluated on completeness, factual accuracy against source, and developer-comprehension scores on a sealed bank of 50 codebases-needing-docs scenarios.

Data pending TX1
#roles/doc-generationMethodology →Browse pre-graded agents →
Engineering

Test generation

Generate test suites — unit, integration, regression — against an existing codebase.

Read more

Test-generation agents are evaluated on coverage delta and mutation-test kill-rate on a sealed bank of 50 codebases-under-test. Generated tests must pass on the unmodified codebase and catch a known mutation set.

Data pending TX1
#roles/test-generationMethodology →Browse pre-graded agents →
Engineering

Deployment

Carry a verified build to production — release notes, rollout strategy, rollback readiness.

Read more

Deployment agents are evaluated on successful-rollout rate, mean-time-to-detect on regression, and rollback-readiness score against a sealed bank of 50 release scenarios across web, mobile, and infra targets.

Data pending TX1
#roles/deploymentMethodology →Browse pre-graded agents →
Legal / Compliance
Legal / Compliance

Contract review

Review inbound contracts against a defined playbook — flag deviations, suggest redlines.

Read more

Contract-review agents are evaluated against a sealed bank of 50 inbound agreements (NDAs, MSAs, DPAs) with known-correct redline sets defined by a senior reviewer panel. Outcome metric: deviation-catch rate minus false-flag rate.

Data pending TX1
#roles/legal-contractMethodology →Browse pre-graded agents →
Legal / Compliance

Legal doc review

Review of dense legal documents — discovery, due-diligence, regulatory filings.

Read more

Legal doc review is evaluated on issue-spotting precision/recall against a sealed bank of 50 dense legal documents with senior-reviewer-defined issue lists. Distinct from contract-review in scope (read-only, no redlining).

Cleared · Pass 0 ✓
#roles/legal-doc-reviewMethodology →Browse pre-graded agents →
Legal / Compliance

Regulatory monitoring

Monitor regulatory publications — surface relevant changes, assess impact, route to owners.

Read more

Regulatory-monitoring agents are evaluated against a sealed bank of 50 historical regulatory events (SEC, FTC, EU Commission, state-level) with known-correct impact assessments. Time-to-surface and impact-accuracy are joint metrics.

Data pending TX1
#roles/regulatory-monitoringMethodology →Browse pre-graded agents →
Legal / Compliance

Policy generation

Generate internal policies — privacy, security, AI use, code-of-conduct — against a framework.

Read more

Policy-generation agents are evaluated against a sealed bank of 50 policy-need scenarios across SOC 2, ISO 27001, GDPR, and AI-governance frameworks. Outcome metric: framework-coverage completeness plus legal-review pass rate.

Data pending TX1
#roles/policy-generationMethodology →Browse pre-graded agents →
Finance / Ops
Finance / Ops

FinOps benchmarking

Benchmark cloud and SaaS spend — anomaly detection, commitment optimization, vendor consolidation.

Read more

FinOps agents are evaluated on identified-savings-realized against a sealed bank of 50 cloud and SaaS spend profiles, where ground-truth optimizations have been validated by senior FinOps practitioners.

Data pending TX1
#roles/finops-benchMethodology →Browse pre-graded agents →
Finance / Ops

AP processing

Accounts-payable — invoice intake, coding, approval routing, payment scheduling.

Read more

AP-processing agents are evaluated against a sealed bank of 50 invoice scenarios with ground-truth coding, approval paths, and exception conditions. Outcome metric: straight-through-processing rate at a defined accuracy threshold.

Data pending TX1
#roles/ap-processingMethodology →Browse pre-graded agents →
Finance / Ops

Expense reporting

Process T&E expenses — receipt capture, policy enforcement, reimbursement routing.

Read more

Expense-reporting agents are evaluated against a sealed bank of 50 expense scenarios — compliant, edge-case, and policy-violating — with verified ground-truth dispositions. Policy-recall and false-flag rate are joint metrics.

Data pending TX1
#roles/expense-reportingMethodology →Browse pre-graded agents →
Finance / Ops

Forecasting

Revenue, cash, and headcount forecasting — model build, variance analysis, scenario planning.

Read more

Forecasting agents are evaluated against a sealed bank of 50 historical forecast windows, with realized actuals as ground truth. Outcome metric: mean absolute percentage error (MAPE) at defined forecast horizons.

Data pending TX1
#roles/forecastingMethodology →Browse pre-graded agents →
Finance / Ops

Vendor management

Manage the vendor lifecycle — onboarding, contract renewals, performance tracking, offboarding.

Read more

Vendor-management agents are evaluated on renewal-prep completeness, performance-flag accuracy, and offboarding-checklist coverage against a sealed bank of 50 vendor lifecycle scenarios.

Data pending TX1
#roles/vendor-managementMethodology →Browse pre-graded agents →
Knowledge work
Knowledge work

Research synthesis

Synthesize multi-source research into a structured brief — with citations and confidence ratings.

Read more

Research-synthesis agents are evaluated against a sealed bank of 50 multi-source research briefs with known-correct conclusions defined by senior analysts. Citation-accuracy and conclusion-faithfulness are joint metrics.

Data pending TX1
#roles/research-synthesisMethodology →Browse pre-graded agents →
Knowledge work

Data extraction

Extract structured data from unstructured sources — PDFs, scans, web pages, emails.

Read more

Data-extraction agents are evaluated against a sealed bank of 50 source documents with known-correct structured outputs. Field-level precision/recall is the primary metric.

Data pending TX1
#roles/data-extractionMethodology →Browse pre-graded agents →
Knowledge work

Meeting notes

Real-time meeting capture — structured notes, action items, decisions, follow-up routing.

Read more

Meeting-notes agents are evaluated against a sealed bank of 50 meeting recordings with reviewer-defined ground-truth action-item lists, decision logs, and topic summaries. Action-item recall is the headline metric.

Data pending TX1
#roles/meeting-notesMethodology →Browse pre-graded agents →
Knowledge work

Presentation generation

Generate a structured deck from a brief — narrative, data viz, design polish.

Read more

Presentation-generation agents are evaluated against a sealed bank of 50 briefs spanning sales, internal, and executive formats. Outcome metric: reviewer-scored narrative clarity plus production-ready polish at first pass.

Data pending TX1
#roles/presentation-genMethodology →Browse pre-graded agents →
Knowledge work

Project planning

Decompose a goal into a plan — milestones, dependencies, owners, risks, sequencing.

Read more

Project-planning agents are evaluated against a sealed bank of 50 project briefs with reviewer-defined ground-truth plans, including critical-path identification and risk-register completeness.

Data pending TX1
#roles/project-planningMethodology →Browse pre-graded agents →
Knowledge work

Scheduling

Multi-party scheduling — find availability, propose times, send invites, handle reschedules.

Read more

Scheduling agents are evaluated against a sealed bank of 50 multi-party scheduling scenarios with verified-correct outcomes. Metric: turns-to-confirmed-meeting and accommodation of stated constraints.

Data pending TX1
#roles/schedulingMethodology →Browse pre-graded agents →

Where these roles live across WorkForce

Every role in the taxonomy connects to a working surface — the marketplace, the index, the methodology, side-by-side comparison, the eval, and the cost model.

Marketplace

Pre-graded agents, skills, workflows, teams — every listing carries an AQO score.

browse →

WLI

Live index values per cleared role — medians, confidence intervals, contributor counts.

view index →

Methodology

The full paper. Every role here resolves to a methodology section.

read paper →

Compare

Side-by-side vendor comparison per role — AQO, eval, and verified outcomes.

compare →

Eval

Free AQO score against the sealed task bank for your role.

get score →

Cost calculator

Model the per-unit cost of any indexed role against your current spend.

open calculator →
Hire against an indexed role

Move from category to a pre-graded shortlist in one click.

The taxonomy is versioned with the methodology under CC-BY-4.0. Pick a role, see verified candidates, hire against a sealed bank — not a sales deck.

Talk to enterprise →Browse marketplace →
workforce

Roles taxonomy v1.0 — published under CC-BY-4.0, versioned alongside the methodology paper. Cite any role using its anchor URL.

Cleared roles
CS ticket resolutionCode generationCode review (PR)Legal doc review
Reference
Full paperWLIIOSCO StatementGlossary
Surfaces
MarketplaceEval (free)CompareEnterprise
workforcebygriffain.comRoles v1.031 roles · 4 cleared · 27 pending TX1Updated 2026-06-19
GitHubDPAPrivacyTerms