Claude Fable 5
Claude Fable 5 is Anthropic's most capable widely released foundation model, built for demanding reasoning, long-horizon agentic work, and 1M token context processing via API.
About Claude Fable 5
Claude Fable 5 is Anthropic's most capable widely released AI foundation model, designed for the most demanding reasoning tasks and long-horizon agentic workflows. Released on June 9, 2026, it is available generally on the Claude API, Claude Platform on AWS, Amazon Bedrock, Vertex AI, and Microsoft Foundry. The model sets a new capability tier within Anthropic's Claude family, positioned above the Opus line for teams and enterprises requiring the highest available intelligence at scale.
The model operates with a 1 million token context window by default, enabling processing of entire codebases, lengthy legal documents, research repositories, and complex multi-document workflows in a single pass. It supports up to 128,000 output tokens per request, making it suited for generating long-form content, detailed analysis reports, and complete software modules within a single inference call.
Claude Fable 5 introduces adaptive thinking as its exclusive reasoning mode. Rather than relying on manually configured thinking budgets, the model dynamically determines when and how much to reason based on the complexity of each request. This design reduces inference overhead on simple tasks while allocating deeper computation to problems that require it. Developers control thinking depth through the effort parameter rather than managing token budgets directly.
The model includes safety classifiers that can decline certain requests, returning a structured stop_reason: "refusal" response rather than an HTTP error. Anthropic provides server-side and client-side fallback mechanisms, along with SDK middleware, so declined requests can be automatically retried against another Claude model without developer-managed retry logic. Refused requests generated before any output are not billed.
Key Features
- Adaptive Thinking: Always-on reasoning mode that dynamically scales computational depth to task complexity, controlled via the effort parameter instead of manual token budgets.
- 1M Token Context Window: Processes up to one million tokens of input at standard pricing, enabling full codebase comprehension, large document analysis, and extended agentic sessions without chunking.
- 128k Output Tokens: Generates up to 128,000 tokens per synchronous API response, supporting long-form code generation, detailed reports, and extensive structured outputs in a single call.
- Structured Refusal Handling: Returns
stop_reason: "refusal"with classifier identification on declined requests, enabling programmatic fallback routing to other models without manual error handling. - Compaction and Context Editing: Supports conversation compaction and tool result clearing through the context editing beta, reducing token costs across long agentic sessions.
- Multi-Platform Availability: Generally available on Claude API, Claude Platform on AWS, Amazon Bedrock, Vertex AI, and Microsoft Foundry from day one of release.
Pricing
-
Standard API (Input): $10 per million tokens Base input token rate for all synchronous Claude Fable 5 API calls.
-
Standard API (Output): $50 per million tokens Output token rate per million tokens generated in synchronous responses.
-
Batch API (Input): $5 per million tokens 50% discount on input tokens for asynchronous batch processing via the Message Batches API.
-
Batch API (Output): $25 per million tokens 50% discount on output tokens for asynchronous batch processing.
-
Prompt Cache Write (5 min): $12.50 per million tokens 1.25x multiplier on base input rate; cache valid for 5 minutes.
-
Prompt Cache Write (1 hour): $20 per million tokens 2x multiplier on base input rate; cache valid for 1 hour.
-
Prompt Cache Read: $1 per million tokens 0.1x multiplier on base input rate when retrieving cached content.
-
Data Residency (US-only inference): 1.1x multiplier on all token categories Applied when using the
inference_geo: "us"parameter on Claude API and Claude Platform on AWS.
All prices are in USD. Claude Fable 5 is a Covered Model with 30-day data retention; zero data retention is not available for this model.
Pricing Source and Comparison - https://platform.claude.com/docs/en/about-claude/pricing
Pricing last updated: June 10, 2026 at 12:00 AM
Use Cases
- Long-horizon agentic coding and autonomous software engineering workflows requiring multi-step planning and execution
- Large document analysis including full codebase review, legal contract processing, and multi-document research synthesis
- Complex multi-step reasoning tasks in financial modeling, scientific research, and enterprise decision support
- Production agentic pipelines requiring structured refusal handling and automatic fallback to lower-tier models
Pros & Cons
Pros:
- Highest capability tier among Anthropic's generally available models, suited for the most demanding reasoning workloads
- Adaptive thinking eliminates manual token budget management while delivering depth-appropriate computation
- Full 1M token context window at standard per-token pricing with no surcharge for large context
Cons:
- Higher price point at $10 input and $50 output per million tokens compared to Opus and Sonnet tiers
- 30-day mandatory data retention on all requests; zero data retention is not supported for this model
- Raw chain-of-thought is never returned; only summarized or omitted thinking blocks are accessible to developers
Integrations
Anthropic Claude API, Claude Platform on AWS, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, Claude Code, Anthropic TypeScript SDK, Anthropic Python SDK, Anthropic Go SDK, Anthropic Java SDK, Anthropic C# SDK, MCP servers
FAQ
Tags:
Last edited
June 10, 2026 at 2:38 AM by Venkatraman Chandrasekaran
Claude Fable 5 Product Guide
A practical guide to getting started with Claude Fable 5 — covering how to access the API, control reasoning depth with the effort parameter, handle refusals, and manage costs across different workloads.
- How to Get Started with Claude Fable 5
- Understanding the Effort Parameter
- Working with the 1M Token Context Window
- How Refusals Work and How to Handle Them
- Setting Up Automatic Fallback
- Key Pitfalls to Avoid
- Prompt Caching to Reduce Costs
- Batch Processing for High-Volume Workloads
- Data Residency and Compliance
- Top 6 Use Cases of Claude Fable 5
- Useful Links
How to Get Started with Claude Fable 5
Claude Fable 5 is available via the Anthropic Claude API, Claude Platform on AWS, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. To start, create an account at platform.claude.com, generate an API key from the Console, and make your first request using the model ID claude-fable-5.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-fable-5",
max_tokens=4096,
messages=[
{"role": "user", "content": "Summarize the key risks in this contract."}
],
)
print(response.content[0].text)
The model is available in the same SDKs used for all Claude models: TypeScript, Python, Go, Java, C#, PHP, and Ruby, as well as raw HTTP.
Understanding the Effort Parameter
Claude Fable 5 uses adaptive thinking as its only reasoning mode — the model decides when and how much to think based on each request. You control that depth using the effort parameter rather than manually setting token budgets.
| Effort Level | When to Use |
|---|---|
max | Frontier problems requiring the deepest possible reasoning |
xhigh | Long-running agentic and coding tasks over 30 minutes |
high | Default — complex reasoning, difficult coding, nuanced analysis |
medium | Balanced speed and quality for most agentic workflows |
low | Simple tasks, subagents, high-volume or latency-sensitive calls |
The default is high. Set it explicitly for any workload where cost or latency matters:
response = client.messages.create(
model="claude-fable-5",
max_tokens=16000,
output_config={"effort": "medium"},
messages=[{"role": "user", "content": "Your prompt here"}],
)
For long-running agentic tasks at xhigh or max, set a large max_tokens — starting at 64k is a reasonable default — so the model has room to reason and call tools across multiple steps.
The effort parameter affects all token spend: text responses, tool calls, and thinking. Lower effort causes the model to make fewer tool calls, skip preambles, and produce more concise outputs. Higher effort produces more thorough tool use, detailed planning, and comprehensive code comments.
Working with the 1M Token Context Window
Claude Fable 5 supports a 1 million token context window at standard per-token pricing — no surcharge for large context. This enables workflows that previously required chunking or retrieval-augmented generation:
- Pass an entire codebase as context for architecture review or refactoring
- Load full legal agreements or regulatory documents for clause-level analysis
- Maintain complete conversation history across long agentic sessions without summarization
- Process multi-document research sets in a single inference call
The 128k output token limit per synchronous request supports generating complete module implementations, detailed reports, and structured data outputs in one call.
For very long sessions, use compaction to reduce token accumulation costs, and context editing (beta, via the context-management-2025-06-27 header) to clear tool results that are no longer needed in the active context.
How Refusals Work and How to Handle Them
Claude Fable 5 includes safety classifiers that can decline certain requests. A refusal is not an HTTP error — it returns a normal 200 response with stop_reason: "refusal":
{
"stop_reason": "refusal",
"stop_details": {
"type": "refusal",
"category": "cyber",
"explanation": "This request was declined because it could enable cyber harm."
},
"usage": {
"input_tokens": 412,
"output_tokens": 0
}
}
The stop_details.category field identifies which classifier fired. Defined categories are cyber (requests that could enable cyberattacks), bio (requests that could enable biological harm), and reasoning_extraction (requests asking the model to reproduce its internal chain of thought). The explanation field is human-readable but not stable — display it rather than parse it.
Requests refused before any output is generated are not billed.
Setting Up Automatic Fallback
Most requests that Claude Fable 5 declines can be served by another Claude model. The simplest approach is server-side fallback — pass the fallbacks parameter and the API retries automatically:
response = client.beta.messages.create(
model="claude-fable-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Your prompt here"}],
fallbacks=[{"model": "claude-opus-4-8"}],
betas=["server-side-fallback-2026-06-01"],
)
You receive one response. The model field in the response names whichever model actually served the reply. Server-side fallback is available on the Claude API and Claude Platform on AWS. For Amazon Bedrock, Vertex AI, and Microsoft Foundry, use the SDK middleware instead:
from anthropic import Anthropic
from anthropic.lib.beta import BetaRefusalFallbackMiddleware, BetaFallbackState
client = Anthropic(
middleware=[BetaRefusalFallbackMiddleware([{"model": "claude-opus-4-8"}])],
)
state = BetaFallbackState()
with state:
message = client.beta.messages.create(
max_tokens=1024,
model="claude-fable-5",
messages=[{"role": "user", "content": "Your prompt here"}],
)
The middleware handles retries automatically on any platform and pins follow-up turns in a conversation to the model that accepted the first request.
Key Pitfalls to Avoid
- Always retry on a different model — re-sending a refused request to Claude Fable 5 will produce another refusal
- Branch on
stop_reason == "refusal"directly, not onstop_details(which can benull) - Refusals return HTTP 200, so standard error rate monitoring never sees them — instrument them separately
- Configure fallback on every request path including retry handlers, background workers, and sub-agent calls
Prompt Caching to Reduce Costs
For workloads with repeated system prompts, large static documents, or shared context across many requests, prompt caching significantly reduces input token costs.
| Cache Operation | Price | Duration |
|---|---|---|
| 5-minute cache write | $12.50 / MTok | 5 minutes |
| 1-hour cache write | $20 / MTok | 1 hour |
| Cache read (hit) | $1 / MTok | Same as preceding write |
A cache read costs 10% of the standard input rate, meaning a 5-minute cache pays for itself after one cache read, and a 1-hour cache after two reads. Prompt caching can be combined with batch processing for maximum cost reduction on high-volume asynchronous workloads.
Batch Processing for High-Volume Workloads
The Message Batches API processes large volumes of requests asynchronously at 50% off both input and output token rates:
- Standard: $10 input / $50 output per million tokens
- Batch: $5 input / $25 output per million tokens
Batch mode is well-suited for document processing pipelines, bulk analysis tasks, and offline research workflows where response latency is not a constraint. Note that server-side fallback is not available in batch mode — refused batch items should be collected from results and resubmitted to a fallback model as a new batch.
Data Residency and Compliance
For organizations with data residency requirements, the inference_geo: "us" parameter on the Claude API and Claude Platform on AWS guarantees US-only inference routing at a 1.1x pricing multiplier across all token categories. Global routing is the default and uses standard pricing.
Claude Fable 5 is a Covered Model with 30-day mandatory data retention. Zero data retention is not available for this model. Organizations that require ZDR for all workloads should evaluate whether Claude Fable 5 fits their compliance posture before deployment.
Top 6 Use Cases of Claude Fable 5
- Autonomous code review and software engineering across full repositories using 1M token context
- Legal and compliance document analysis requiring full corpus context in a single session
- Financial research synthesis from earnings reports, filings, and multi-source data into structured outputs
- Multi-agent AI pipeline orchestration with structured refusal handling and automatic model fallback
- Technical due diligence for M&A and vendor assessments involving large codebases or documentation sets
- Scientific literature synthesis and hypothesis generation across large research corpora
