Claude Fable 5: Anthropic's First Public Mythos Model — Banned in 72 Hours

AI Tools Insight • 2026-06-30 • Anthropic Claude Fable 5 Mythos 5 SWE-Bench AI Models Safety Export Control

Claude Fable 5: Anthropic's First Public Mythos Model — Banned in 72 Hours

Claude Fable 5 reaching for the Mythos ceiling, blocked by Export Control

In One Sentence

On June 9, 2026, Anthropic released Claude Fable 5 — the first publicly available Mythos-class model. It scored 80.3% on SWE-Bench Pro, a 22-point lead over GPT-5.5 (58.6%). Three days later, the U.S. government ordered a suspension of access for foreign nationals on national security grounds. Anthropic disabled the model worldwide.

This article covers: Fable 5's real benchmark results, a full comparison with Opus 4.8 / GPT-5.5 / Gemini 3.1 Pro, how the safety classifier system works, what Mythos 5 (the unrestricted version) can do, and the export control saga.


Quick Comparison Table

Dimension Claude Fable 5 Claude Opus 4.8 GPT-5.5 Gemini 3.1 Pro
Vendor Anthropic Anthropic OpenAI Google DeepMind
Release date 2026-06-09 2026-05-28 2026-04 2026-05
Input price ($/M tokens) $10 $5 $5 $2
Output price ($/M tokens) $50 $25 $30 $12
SWE-Bench Pro 80.3% 69.2% 58.6% 54.2%
FrontierCode Diamond 29.3% 13.4% 5.7%
GDP.pdf (vision, no tools) 29.8% 22.5% 24.9% 16.7%
Safety mechanism Classifier → Opus 4.8 fallback Standard Standard Standard
Current status ⛔ Suspended globally ✅ Available ✅ Available ✅ Available

Sources: Anthropic official, Vellum analysis


What Is Mythos Tier?

Anthropic's model tiers run: Haiku → Sonnet → Opus → Mythos. Fable 5 is the first publicly available Mythos-class model, representing Anthropic's current ceiling.

Compared to Opus 4.8:

The underlying Mythos 5 is the same model with some safety restrictions removed, available only through Project Glasswing for U.S. cyber defenders and critical infrastructure operators.


Benchmark Results

All figures cross-verified from Anthropic, Vellum, and LushBinary.

Coding: Fable 5's Dominance

Benchmark Fable 5 Opus 4.8 GPT-5.5 Gemini 3.1 Pro
SWE-Bench Pro 80.3% 69.2% 58.6% 54.2%
FrontierCode Diamond (hardest) 29.3% 13.4% 5.7%
Terminal-Bench 2.1★ 88.0%★ 82.7% 83.4% 70.7%

★ Mythos 5 (unrestricted) score. Fable 5 falls back to Opus 4.8 on security-related tasks.

Fable 5's 80.3% on SWE-Bench Pro is a clear lead over the field. Real-world case: Stripe used it to perform a codebase-wide migration across 50 million lines of Ruby — finished in one day vs an estimated 2+ months for a full team.

Cursor founder Michael Truell: "Fable 5 is the state of the art model on CursorBench. It's opened up a class of long-horizon problems that were out of reach."

Knowledge Work & Vision

Benchmark Fable 5 Opus 4.8 GPT-5.5 Gemini 3.1 Pro
GDPval-AA (knowledge, ELO) 1932 1890 1769 1314
GDP.pdf (vision-only, no tools) 29.8% 22.5% 24.9% 16.7%
OSWorld-Verified (computer use) 85.0% 83.4% 78.7% 76.2%
AutomationBench (tool use) 17.4% 15.5% 12.9% 9.6%
Legal Agent Benchmark 13.3% 10.4% 2.1% 0.0%

The vision leap is significant — Fable 5 can reconstruct web app source code from screenshots alone. Anthropic's CTO says tasks that required 100 prompts a year ago can now be "one-shotted."

Memory & Long-Running Tasks

Fable 5 holds focus across millions of tokens and self-improves using persistent notes:

Key behavioral shift: stays on task longer, validates its own work before declaring completion.

Mythos 5 (Unrestricted) Scientific Capabilities

Benchmark Mythos 5 Opus 4.8
BioMysteryBench 46.1% 40.0%
ExploitBench (cybersecurity) 78.0% 40.0%
HealthBench Professional 66.0% 56.9%
Humanity's Last Exam (with tools) ★ 64.5% 57.9%

Source: LushBinary comparison

Key finding: In blind tests, scientists preferred Mythos 5's molecular biology hypotheses ~80% of the time over Opus-class models. One novel mechanism for an E. coli protein was independently corroborated by another lab.

In drug design: accelerated parts of the workflow by ~10× — 9 of 14 protein targets yielded strong candidates.


The Safety Classifier System

Safety classifier routing queries to Fable 5 or Opus 4.8 fallback

Fable 5's core innovation — and biggest controversy — is the safety classifier system. A separate AI monitors input queries and determines if they touch sensitive domains (cybersecurity, biology/chemistry, model distillation):

  1. Blocked queries → routed to Claude Opus 4.8 for response
  2. Unblocked queries → Fable 5 at full capability (≈ Mythos 5 level)

Key stats:

Controversy: the classifiers are overly broad. Legitimate cybersecurity research questions can get intercepted. This drew community criticism.

Additionally, 30-day data retention is mandatory for all Mythos-class model traffic. Anthropic pledges no training use; data is deleted after 30 days.


The Core Event: A 72-Hour Lifecycle

Date Event
2026-06-09 Anthropic releases Fable 5 + Mythos 5
2026-06-09 to 06-12 Normal service
2026-06-12, 5:21 PM ET U.S. export control directive takes effect
2026-06-12 Anthropic disables Fable 5 + Mythos 5 worldwide
Present Still unavailable (awaiting government consultation)

The Dispute

The government claimed to have discovered a jailbreak method that allowed Fable 5 to perform restricted cybersecurity tasks.

Anthropic's response:

"We reviewed a demonstration of this specific technique being used to identify a small number of previously known, minor vulnerabilities. These vulnerabilities all appear relatively simple, and we have found that other publicly-available models are able to discover them as well without requiring a bypass." — Anthropic, 2026-06-13

Anthropic called the government's action a "misunderstanding" and argued that a narrow, non-universal potential jailbreak should not trigger a recall of a widely deployed commercial model.

Behind the scenes:

Current Impact


Pricing Analysis

Standard Rates

Model Input ($/M tokens) Cached Input Output ($/M tokens)
Fable 5 / Mythos 5 $10 $1 $50
Opus 4.8 $5 $0.50 $25
GPT-5.5 $5 $0.50 $30
Gemini 3.1 Pro $2 $0.20 $12

Source: LushBinary pricing table

Per-Task Cost Comparison (200K input + 50K output, no cache)

Model Input (200K) Output (50K) Total per task
Fable 5 $2.00 $2.50 $4.50
GPT-5.5 $1.00 $1.50 $2.50
Opus 4.8 $1.00 $1.25 $2.25
Gemini 3.1 Pro $0.40 $0.60 $1.00

Cost is Fable 5's weakness — 80% more expensive than GPT-5.5 per task, 2× Opus 4.8. But the real metric is cost per successful task: if Fable 5 solves in one pass while a cheaper model needs 2-3 retries, total cost may favor Fable 5.

Subscription Note

Pro/Max/Team subscribers got free Fable 5 access June 9–22. After June 23, Fable 5 required usage credits. This is moot now — the model is suspended.


Decision Guide: What to Use Until Fable 5 Returns

Task Profile Recommended Model Rationale
Hard coding, framework migrations, multi-step autonomous programming Opus 4.8 (interim Fable 5 replacement) Best available coding model at $5/$25
Routine high-volume, latency-sensitive: classification, summarization, translation Opus 4.8 Half the price of Fable 5
Codex CLI investment, diagram-heavy reasoning GPT-5.5 Strong agentic coding at 56% of Fable 5 cost
Google ecosystem (Vertex AI, Workspace), high throughput Gemini 3.1 Pro ~4.5× cheaper; gap not fatal for most tasks
Cybersecurity or biology workloads Test Opus 4.8 or GPT-5.5 carefully Even with Fable 5 available, these queries fall back to Opus 4.8

Summary

Claude Fable 5's 72-hour existence highlights the two core tensions in 2026 AI:

  1. Capability vs. Regulation — Mythos-class coding and science capabilities are real and unprecedented. That very capability triggered the government response.
  2. Openness vs. Safety — Anthropic's safety classifier system is the industry's most advanced, yet it still wasn't enough for regulators. The strict safety mechanisms (30-day retention, fallback strategy) also frustrated developers.

For most developers, the practical impact is manageable: Opus 4.8 is still the best available coding model, while GPT-5.5 holds unique advantages in the Codex CLI ecosystem. Fable 5 was a glimpse of the ceiling — a demonstration of what's possible, waiting to return.


Important Note

All benchmark figures come from Anthropic's official reports or reputable third-party evaluations. Scores are influenced by the testing scaffold: e.g., GPT-5.5 uses Codex CLI for Terminal-Bench, Gemini uses Gemini CLI. SWE-Bench Pro and FrontierCode control for this variance.


Sources

Stay Ahead of AI Model Releases

Bookmark AI Tools Insight for honest, data-driven AI model comparisons. No hype, just what works.

Subscribe
Share this article

Comments & Danmaku

Leave a comment — it flies across the page as danmaku!