What is Claude Fable 5?

Claude Fable 5 is Anthropic's most capable widely released foundation model, built for demanding reasoning tasks and long-horizon agentic workflows. It became generally available on June 9, 2026 on the Claude API, Claude Platform on AWS, Amazon Bedrock, Vertex AI, and Microsoft Foundry.

What is the context window size for Claude Fable 5?

Claude Fable 5 supports a 1 million token context window by default. The full 1M context is available at standard per-token pricing with no additional surcharge for large context usage.

What is adaptive thinking and how does it work on Claude Fable 5?

Adaptive thinking is the only reasoning mode on Claude Fable 5. Instead of manually setting a thinking token budget, the model dynamically decides when and how much to reason based on the complexity of the request. Developers control thinking depth through the effort parameter. The raw chain of thought is never returned; developers can receive summarized thinking by setting thinking.display to 'summarized'.

What happens when Claude Fable 5 refuses a request?

When Claude Fable 5 declines a request due to its safety classifiers, the Messages API returns a successful HTTP 200 response with stop_reason set to 'refusal', not an error. The response also reports which classifier declined the request. Requests refused before generating any output are not billed.

Can I set up automatic fallback when Claude Fable 5 refuses a request?

Yes. Anthropic provides server-side fallback via the fallbacks parameter (in beta on Claude API and Claude Platform on AWS) and client-side fallback through SDK middleware available in TypeScript, Python, Go, Java, and C# SDKs. When a request is retried on another model, fallback credit offsets the prompt cache cost of switching models.

What is the maximum output length for Claude Fable 5?

Claude Fable 5 supports up to 128,000 output tokens per request on the synchronous Messages API. On the Message Batches API, additional output beta headers may extend this further.

How is Claude Fable 5 priced?

Claude Fable 5 is priced at $10 per million input tokens and $50 per million output tokens. Batch API usage is discounted 50% to $5 input and $25 output per million tokens. Prompt caching, data residency, and other modifiers apply on top of these base rates.

Does Claude Fable 5 support zero data retention?

No. Claude Fable 5 is a Covered Model with a mandatory 30-day data retention policy. Zero data retention is not available for this model.

What is the difference between Claude Fable 5 and Claude Mythos 5?

Claude Fable 5 and Claude Mythos 5 share the same underlying capabilities and pricing. The key difference is that Claude Mythos 5 operates without the safety classifiers present in Claude Fable 5 and is available only in limited release to approved organizations through Anthropic's Project Glasswing. Claude Fable 5 is the generally available option for all developers and enterprises.

Claude Fable 5

Claude Fable 5 is Anthropic's most capable widely released foundation model, built for demanding reasoning, long-horizon agentic work, and 1M token context processing via API.

Visit Claude Fable 5

About Claude Fable 5

Claude Fable 5 is Anthropic's most capable widely released AI foundation model, designed for the most demanding reasoning tasks and long-horizon agentic workflows. Released on June 9, 2026, it is available generally on the Claude API, Claude Platform on AWS, Amazon Bedrock, Vertex AI, and Microsoft Foundry. The model sets a new capability tier within Anthropic's Claude family, positioned above the Opus line for teams and enterprises requiring the highest available intelligence at scale.

The model operates with a 1 million token context window by default, enabling processing of entire codebases, lengthy legal documents, research repositories, and complex multi-document workflows in a single pass. It supports up to 128,000 output tokens per request, making it suited for generating long-form content, detailed analysis reports, and complete software modules within a single inference call.

Claude Fable 5 introduces adaptive thinking as its exclusive reasoning mode. Rather than relying on manually configured thinking budgets, the model dynamically determines when and how much to reason based on the complexity of each request. This design reduces inference overhead on simple tasks while allocating deeper computation to problems that require it. Developers control thinking depth through the effort parameter rather than managing token budgets directly.

The model includes safety classifiers that can decline certain requests, returning a structured stop_reason: "refusal" response rather than an HTTP error. Anthropic provides server-side and client-side fallback mechanisms, along with SDK middleware, so declined requests can be automatically retried against another Claude model without developer-managed retry logic. Refused requests generated before any output are not billed.

Key Features

Adaptive Thinking: Always-on reasoning mode that dynamically scales computational depth to task complexity, controlled via the effort parameter instead of manual token budgets.
1M Token Context Window: Processes up to one million tokens of input at standard pricing, enabling full codebase comprehension, large document analysis, and extended agentic sessions without chunking.
128k Output Tokens: Generates up to 128,000 tokens per synchronous API response, supporting long-form code generation, detailed reports, and extensive structured outputs in a single call.
Structured Refusal Handling: Returns stop_reason: "refusal" with classifier identification on declined requests, enabling programmatic fallback routing to other models without manual error handling.
Compaction and Context Editing: Supports conversation compaction and tool result clearing through the context editing beta, reducing token costs across long agentic sessions.
Multi-Platform Availability: Generally available on Claude API, Claude Platform on AWS, Amazon Bedrock, Vertex AI, and Microsoft Foundry from day one of release.

Pricing

Standard API (Input): $10 per million tokens Base input token rate for all synchronous Claude Fable 5 API calls.
Standard API (Output): $50 per million tokens Output token rate per million tokens generated in synchronous responses.
Batch API (Input): $5 per million tokens 50% discount on input tokens for asynchronous batch processing via the Message Batches API.
Batch API (Output): $25 per million tokens 50% discount on output tokens for asynchronous batch processing.
Prompt Cache Write (5 min): $12.50 per million tokens 1.25x multiplier on base input rate; cache valid for 5 minutes.
Prompt Cache Write (1 hour): $20 per million tokens 2x multiplier on base input rate; cache valid for 1 hour.
Prompt Cache Read: $1 per million tokens 0.1x multiplier on base input rate when retrieving cached content.
Data Residency (US-only inference): 1.1x multiplier on all token categories Applied when using the inference_geo: "us" parameter on Claude API and Claude Platform on AWS.

All prices are in USD. Claude Fable 5 is a Covered Model with 30-day data retention; zero data retention is not available for this model.

Pricing Source and Comparison - https://platform.claude.com/docs/en/about-claude/pricing

Pricing last updated: June 10, 2026 at 12:00 AM

Use Cases

Long-horizon agentic coding and autonomous software engineering workflows requiring multi-step planning and execution
Large document analysis including full codebase review, legal contract processing, and multi-document research synthesis
Complex multi-step reasoning tasks in financial modeling, scientific research, and enterprise decision support
Production agentic pipelines requiring structured refusal handling and automatic fallback to lower-tier models

Pros & Cons

Pros:

Highest capability tier among Anthropic's generally available models, suited for the most demanding reasoning workloads
Adaptive thinking eliminates manual token budget management while delivering depth-appropriate computation
Full 1M token context window at standard per-token pricing with no surcharge for large context

Cons:

Higher price point at $10 input and $50 output per million tokens compared to Opus and Sonnet tiers
30-day mandatory data retention on all requests; zero data retention is not supported for this model
Raw chain-of-thought is never returned; only summarized or omitted thinking blocks are accessible to developers

Integrations

Anthropic Claude API, Claude Platform on AWS, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, Claude Code, Anthropic TypeScript SDK, Anthropic Python SDK, Anthropic Go SDK, Anthropic Java SDK, Anthropic C# SDK, MCP servers

FAQ

Categories:

AI Models Foundation

Tags:

enterprise-ai

Last edited

June 10, 2026 at 2:38 AM by Venkatraman Chandrasekaran

Claude Fable 5 Product Guide

A practical guide to getting started with Claude Fable 5 — covering how to access the API, control reasoning depth with the effort parameter, handle refusals, and manage costs across different workloads.

Table of Contents

How to Get Started with Claude Fable 5
Understanding the Effort Parameter
Working with the 1M Token Context Window
How Refusals Work and How to Handle Them
Setting Up Automatic Fallback
Key Pitfalls to Avoid
Prompt Caching to Reduce Costs
Batch Processing for High-Volume Workloads
Data Residency and Compliance
Top 6 Use Cases of Claude Fable 5
Useful Links

How to Get Started with Claude Fable 5

Claude Fable 5 is available via the Anthropic Claude API, Claude Platform on AWS, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. To start, create an account at platform.claude.com, generate an API key from the Console, and make your first request using the model ID claude-fable-5.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-fable-5",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize the key risks in this contract."}
    ],
)
print(response.content[0].text)

The model is available in the same SDKs used for all Claude models: TypeScript, Python, Go, Java, C#, PHP, and Ruby, as well as raw HTTP.

Understanding the Effort Parameter

Claude Fable 5 uses adaptive thinking as its only reasoning mode — the model decides when and how much to think based on each request. You control that depth using the effort parameter rather than manually setting token budgets.

Effort Level	When to Use
`max`	Frontier problems requiring the deepest possible reasoning
`xhigh`	Long-running agentic and coding tasks over 30 minutes
`high`	Default — complex reasoning, difficult coding, nuanced analysis
`medium`	Balanced speed and quality for most agentic workflows
`low`	Simple tasks, subagents, high-volume or latency-sensitive calls

The default is high. Set it explicitly for any workload where cost or latency matters:

response = client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    output_config={"effort": "medium"},
    messages=[{"role": "user", "content": "Your prompt here"}],
)

For long-running agentic tasks at xhigh or max, set a large max_tokens — starting at 64k is a reasonable default — so the model has room to reason and call tools across multiple steps.

The effort parameter affects all token spend: text responses, tool calls, and thinking. Lower effort causes the model to make fewer tool calls, skip preambles, and produce more concise outputs. Higher effort produces more thorough tool use, detailed planning, and comprehensive code comments.

Working with the 1M Token Context Window

Claude Fable 5 supports a 1 million token context window at standard per-token pricing — no surcharge for large context. This enables workflows that previously required chunking or retrieval-augmented generation:

Pass an entire codebase as context for architecture review or refactoring
Load full legal agreements or regulatory documents for clause-level analysis
Maintain complete conversation history across long agentic sessions without summarization
Process multi-document research sets in a single inference call

The 128k output token limit per synchronous request supports generating complete module implementations, detailed reports, and structured data outputs in one call.

For very long sessions, use compaction to reduce token accumulation costs, and context editing (beta, via the context-management-2025-06-27 header) to clear tool results that are no longer needed in the active context.

How Refusals Work and How to Handle Them

Claude Fable 5 includes safety classifiers that can decline certain requests. A refusal is not an HTTP error — it returns a normal 200 response with stop_reason: "refusal":

{
  "stop_reason": "refusal",
  "stop_details": {
    "type": "refusal",
    "category": "cyber",
    "explanation": "This request was declined because it could enable cyber harm."
  },
  "usage": {
    "input_tokens": 412,
    "output_tokens": 0
  }
}

The stop_details.category field identifies which classifier fired. Defined categories are cyber (requests that could enable cyberattacks), bio (requests that could enable biological harm), and reasoning_extraction (requests asking the model to reproduce its internal chain of thought). The explanation field is human-readable but not stable — display it rather than parse it.

Requests refused before any output is generated are not billed.

Setting Up Automatic Fallback

Most requests that Claude Fable 5 declines can be served by another Claude model. The simplest approach is server-side fallback — pass the fallbacks parameter and the API retries automatically:

response = client.beta.messages.create(
    model="claude-fable-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Your prompt here"}],
    fallbacks=[{"model": "claude-opus-4-8"}],
    betas=["server-side-fallback-2026-06-01"],
)

You receive one response. The model field in the response names whichever model actually served the reply. Server-side fallback is available on the Claude API and Claude Platform on AWS. For Amazon Bedrock, Vertex AI, and Microsoft Foundry, use the SDK middleware instead:

from anthropic import Anthropic
from anthropic.lib.beta import BetaRefusalFallbackMiddleware, BetaFallbackState

client = Anthropic(
    middleware=[BetaRefusalFallbackMiddleware([{"model": "claude-opus-4-8"}])],
)

state = BetaFallbackState()

with state:
    message = client.beta.messages.create(
        max_tokens=1024,
        model="claude-fable-5",
        messages=[{"role": "user", "content": "Your prompt here"}],
    )

The middleware handles retries automatically on any platform and pins follow-up turns in a conversation to the model that accepted the first request.

Key Pitfalls to Avoid

Always retry on a different model — re-sending a refused request to Claude Fable 5 will produce another refusal
Branch on stop_reason == "refusal" directly, not on stop_details (which can be null)
Refusals return HTTP 200, so standard error rate monitoring never sees them — instrument them separately
Configure fallback on every request path including retry handlers, background workers, and sub-agent calls

Prompt Caching to Reduce Costs

For workloads with repeated system prompts, large static documents, or shared context across many requests, prompt caching significantly reduces input token costs.

Cache Operation	Price	Duration
5-minute cache write	$12.50 / MTok	5 minutes
1-hour cache write	$20 / MTok	1 hour
Cache read (hit)	$1 / MTok	Same as preceding write

A cache read costs 10% of the standard input rate, meaning a 5-minute cache pays for itself after one cache read, and a 1-hour cache after two reads. Prompt caching can be combined with batch processing for maximum cost reduction on high-volume asynchronous workloads.

Batch Processing for High-Volume Workloads

The Message Batches API processes large volumes of requests asynchronously at 50% off both input and output token rates:

Standard: $10 input / $50 output per million tokens
Batch: $5 input / $25 output per million tokens

Batch mode is well-suited for document processing pipelines, bulk analysis tasks, and offline research workflows where response latency is not a constraint. Note that server-side fallback is not available in batch mode — refused batch items should be collected from results and resubmitted to a fallback model as a new batch.

Data Residency and Compliance

For organizations with data residency requirements, the inference_geo: "us" parameter on the Claude API and Claude Platform on AWS guarantees US-only inference routing at a 1.1x pricing multiplier across all token categories. Global routing is the default and uses standard pricing.

Claude Fable 5 is a Covered Model with 30-day mandatory data retention. Zero data retention is not available for this model. Organizations that require ZDR for all workloads should evaluate whether Claude Fable 5 fits their compliance posture before deployment.

Top 6 Use Cases of Claude Fable 5

Autonomous code review and software engineering across full repositories using 1M token context
Legal and compliance document analysis requiring full corpus context in a single session
Financial research synthesis from earnings reports, filings, and multi-source data into structured outputs
Multi-agent AI pipeline orchestration with structured refusal handling and automatic model fallback
Technical due diligence for M&A and vendor assessments involving large codebases or documentation sets
Scientific literature synthesis and hypothesis generation across large research corpora

Useful Links

A Human Edited Software Directory

Advertise on CTODiscovery.

Advertise on CTODiscovery

Similar to Claude Fable 5

View all tools

Claude Fable 5

Claude Fable 5 is Anthropic's most capable widely released foundation model, built for demanding reasoning, long-horizon agentic work, and 1M token context processing via API.

About Claude Fable 5

Key Features

Pricing

Use Cases

Pros & Cons

Integrations

FAQ

Tags:

Last edited

Claude Fable 5 Product Guide

How to Get Started with Claude Fable 5

Understanding the Effort Parameter

Working with the 1M Token Context Window

How Refusals Work and How to Handle Them

Setting Up Automatic Fallback

Key Pitfalls to Avoid

Prompt Caching to Reduce Costs

Batch Processing for High-Volume Workloads

Data Residency and Compliance

Top 6 Use Cases of Claude Fable 5

Useful Links

Similar to Claude Fable 5

Mercury 2

LFM2.5-1.2B-Thinking

GPT-5.4

Similar to Claude Fable 5

Similar to Claude Fable 5

Mercury 2

LFM2.5-1.2B-Thinking

GPT-5.4

Claude Fable 5

Claude Fable 5 is Anthropic's most capable widely released foundation model, built for demanding reasoning, long-horizon agentic work, and 1M token context processing via API.

About Claude Fable 5

Key Features

Pricing

Use Cases

Pros & Cons

Integrations

FAQ

What is Claude Fable 5?

What is the context window size for Claude Fable 5?

What is adaptive thinking and how does it work on Claude Fable 5?

What happens when Claude Fable 5 refuses a request?

Can I set up automatic fallback when Claude Fable 5 refuses a request?

What is the maximum output length for Claude Fable 5?

How is Claude Fable 5 priced?

Does Claude Fable 5 support zero data retention?

On which cloud platforms is Claude Fable 5 available?

What is the difference between Claude Fable 5 and Claude Mythos 5?

Tags:

Last edited

Claude Fable 5 Product Guide

How to Get Started with Claude Fable 5

Understanding the Effort Parameter

Working with the 1M Token Context Window

How Refusals Work and How to Handle Them

Setting Up Automatic Fallback

Key Pitfalls to Avoid

Prompt Caching to Reduce Costs

Batch Processing for High-Volume Workloads

Data Residency and Compliance

Top 6 Use Cases of Claude Fable 5

Useful Links

Similar to Claude Fable 5

Mercury 2

LFM2.5-1.2B-Thinking

GPT-5.4

Similar to Claude Fable 5

Similar to Claude Fable 5

Mercury 2

LFM2.5-1.2B-Thinking

GPT-5.4