NVIDIA Blackwell Tops AgentPerf Benchmark for AI Agent Infrastructure
Published by Venkatraman Chandrasekaran |AI Infrastructure
NVIDIA Blackwell Tops New AgentPerf Benchmark as AI Agents Put Pressure on Data Center Infrastructure
NVIDIA is using a new agentic AI benchmark to make a broader point about the future of AI infrastructure: running AI agents at scale is not the same as serving chatbot responses.
The company said its GB300 NVL72 Blackwell Ultra platform delivered leading results on AgentPerf, a benchmark from Artificial Analysis designed to measure how infrastructure handles real agentic workloads. According to NVIDIA, the Blackwell system supported up to 20 times more agents per megawatt than its earlier Hopper generation in the benchmark’s first published results.
The update is significant because AI agents behave very differently from traditional AI assistants. A simple chatbot request usually involves a prompt, a model response and a completed interaction. An agent may move through a longer chain of actions: reading files, making multiple model calls, using tools, writing code, testing results and continuing until the task is finished.

That difference changes the infrastructure problem. The question is no longer only how fast a model can return tokens. Enterprises and AI platforms now need to understand how many useful agent sessions can run at the same time, how responsive those agents remain under load and how much power is required to sustain that work.
AgentPerf attempts to answer that question by measuring concurrent agent capacity under defined service-level targets. The benchmark is built around coding-agent style workloads, where agents work through tasks that resemble software development activity. NVIDIA says the benchmark uses trajectories based on public code repositories across more than 12 programming languages, while tool calls are simulated so the test can focus on accelerated computing performance.
In NVIDIA’s technical results, GB300 NVL72 reached 61.4K concurrent agents per megawatt in the SLO 30 configuration, compared with 2.6K for NVIDIA H200. The same results showed 57.5 concurrent agents per GPU for GB300 NVL72 versus 1.4 for H200.
For CTOs, cloud architects and AI infrastructure teams, the most important part of the announcement is not the benchmark win itself. It is the shift in measurement. Agentic AI workloads require a different way to think about capacity planning. A system that performs well on single-turn inference may not perform as well when hundreds or thousands of agents are running long, multi-step tasks with growing context and repeated tool use.
NVIDIA attributes the Blackwell result to rack-scale design and software optimization across the stack. The GB300 NVL72 system connects 72 GPUs into a single high-bandwidth system, allowing large mixture-of-experts models to be distributed more efficiently. NVIDIA also points to optimizations in CUDA and TensorRT LLM that help manage communication, compute and concurrent agent sessions.
The benchmark arrives at a time when software vendors are moving quickly from AI copilots to autonomous and semi-autonomous agents. Coding assistants, enterprise workflow agents, customer support agents and SaaS automation tools all place new demands on inference infrastructure. As these systems move into production, infrastructure buyers will need metrics that reflect end-to-end agent behavior rather than isolated model calls.
AgentPerf is still new, and early benchmark results should be treated as the beginning of a measurement category rather than the final word on agentic AI performance. But the direction is clear. As AI agents become a larger part of enterprise software, data center efficiency will increasingly be judged by how much agentic work can be completed per GPU, per dollar and per megawatt.
For NVIDIA, the first AgentPerf results strengthen its positioning around Blackwell as a platform for large-scale agent deployment. For the wider market, the benchmark signals that agentic AI is becoming mature enough to require its own infrastructure standards.
Venkatraman
