TOOL · GLOBAL
arXiv: DecisionBench benchmarks emergent delegation in long-horizon agent workflows
DecisionBench is a benchmark substrate for testing how LLM agents learn to delegate tasks to specialized peer models, measuring routing fidelity, cost, latency, and quality trade-offs.
WHY IT MATTERS
Multi-agent BFSI workflows (e.g., one agent routes to underwriting specialist) need evaluation frameworks for routing quality. DecisionBench provides standardized metrics.
Source: arXiv · 2026-05-21