Google's fastest cost efficient reasoning model for high volume workloads featuring 1M token context adaptive thinking levels and multimodal capabilities.
Mercury 2 is a diffusion-based reasoning LLM delivering 1,000+ tokens/sec throughput on NVIDIA Blackwell GPUs. Features 128K context, native tool use, tunable reasoning, and OpenAI-compatible API for production AI applications.