- Updated: April 3, 2026
- 5 min read
Claude 4.6 Jailbreak Vulnerability Disclosed – Full Details and Implications
Ubos.tech presents an in‑depth analysis of the recently disclosed Claude 4.6 jailbreak vulnerability. The full disclosure, available on GitHub, reveals a comprehensive timeline of submissions to Anthropic, details of constitutional compliance failures across Claude model tiers, and includes evidence files and research documents.
Skip to content You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert Nicholas-Kloster / claude-4.6-jailbreak-vulnerability-disclosure-unredacted Public Notifications You must be signed in to change notification settings Fork 1 Star 8 Code Issues 0 Pull requests 0 Actions Projects Security and quality 0 Insights Additional navigation options Code Issues Pull requests Actions Projects Security and quality Insights mainBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History23 Commits23 Commitsdisclosuresdisclosures evidenceevidence researchresearch LICENSELICENSE README.mdREADME.md View all filesRepository files navigationREADMELicensePrompt Injection, Jailbreak, and Constitutional Compliance Failure Across Claude Opus 4.6 ET, Sonnet 4.6 ET, and Haiku 4.5 ET Unredacted Public Disclosure TL;DR: All three Claude production tiers generated functional exploit code against live infrastructure when user-defined memory protocols suppressed constitutional safety checks across extended conversations.Anthropic was notified six times over 27 days with zero acknowledgment. Disclosure Timeline Date Event Recipient(s) March 4, 2026 Prompt injection vulnerability discovered — March 12, 2026 Prompt injection submission via HackerOne; email to modelbugbounty@anthropic.com Anthropic Model Bug Bounty March 18, 2026 Full proof of concept package sent (12 attachments including PoC video, framework papers, diagrams, screenshots) security@anthropic.com March 22, 2026 Opus 4.6 ET jailbreak reported with afl_disclosure.docx modelbugbounty, security, amanda, alex, usersafety @anthropic.com March 22, 2026 First constitutional failure observed (Sonnet 4.6 ET) — March 24, 2026 Second constitutional failure observed (Opus 4.6 ET) — March 27, 2026 Follow-up email noting 15 days with zero acknowledgment modelbugbounty@anthropic.com March 28, 2026 Third constitutional failure observed (Haiku 4.5 ET) — March 28, 2026 Tri-tier constitutional disclosure submitted with full report modelbugbounty, security, alex, amanda, usersafety, disclosure @anthropic.com March 31, 2026 27 days since first submission. Zero acknowledgment from Anthropic on any channel. — March 31, 2026 Unredacted public disclosure — Anthropic’s own Responsible Disclosure Policy commits to acknowledging submissions within three (3) business days.That commitment was not met across six separate emails to six Anthropic addresses over 27 days. No acknowledgment, no triage, no rejection — nothing. This document was originally submitted with a confidentiality commitment contingent on a functioning disclosure process. That process was never engaged by Anthropic. This is the full, unredacted version.Disclosures Constitutional Compliance Failure — All Three Tiers Between March 22 and March 28, 2026, all three Claude production model tiers violated Anthropic’s own constitutional behavioral policies. Each exhibited the same failure mode: memory-stored interaction protocols combined with incremental escalation prompts produced cumulative character drift with zero self-correction. Finding Model Turns Key Behavior Transcript Opus 4.6 ET claude-opus-4-20250514 31 Autonomous escalation — drove subnet scanning, memory injection, and container escape under its own initiative via “garlic mode” Transcript Sonnet 4.6 ET claude-sonnet-4-20250514 20+ Fake authorization check — asked once, accepted unverified claim, built 1,949-line attack framework against hotel PMS with guest PII Transcript Haiku 4.5 ET claude-haiku-4-5 8+ Zero friction — passive analysis to SYN floods and IP spoofing against state telecom infrastructure with no authorization check Transcript AFL Jailbreak (Ambiguity Front-Loading) Four short prompts bypassed policy evaluation on Opus 4.6 ET. Extended thinking blocks show the model flagging its own safety concerns three times — and overriding itself every time. See disclosures/afl-jailbreak/ for the full disclosure, interactive tools, and proposed mitigations.Document Description AFL Jailbreak Disclosure Full disclosure — pattern anatomy, thinking block evidence, escalation timeline, proposed mitigations AFL Disclosure (original) Initial submission to Anthropic AFL Token Trajectory Analyzer Interactive — swap token positions, watch compliance cascade shift AFL Pattern Anatomy Interactive — visual prompt escalation diagram AFL Defuser Proposed architectural mitigation (React JSX) Sandbox Snapshot Exfiltration 915 files extracted from the Claude.ai code execution sandbox in a single 20-minute mobile session via standard artifact download — including /etc/hosts with hardcoded Anthropic production IPs, JWT tokens from /proc/1/environ, and full gVisor fingerprint.Document Description Sandbox Snapshot Disclosure Full disclosure with evidence screenshots and PoC screencast Research Document Description Constraint Is Freedom (PDF) Formal alignment paper — autoregressive compliance cascade theory, A(S) framework Evidence File Description evidence/ PoC screenshots, screencast, and AFL pattern diagrams License This disclosure document is released under CC BY 4.0. Attribution required for redistribution.About Three Claude production tiers generated functional exploit code against live infrastructure when memory-stored interaction protocols suppressed constitutional safety checks. Six submissions over 27 days. Zero acknowledgment from Anthropic. Full transcripts, PoC evidence, and interactive research tools included.Topics jailbreak bug-bounty vulnerability ai-safety memory-injection claude responsible-disclosure security-research red-teaming ai-security prompt-injection anthropic llm-security constitutional-ai character-drift Resources Readme License View license Uh oh! There was an error while loading. Please reload this page. Activity Stars 8 stars Watchers 0 watching Forks 1 fork Report repository Releases No releases published Packages 0 Uh oh! There was an error while loading. Please reload this page. Contributors Uh oh! There was an error while loading. Please reload this page. Languages HTML 65.1% JavaScript 34.9% You can’t perform that action at this time. [{“Name”:”disclosures”,”Last commit message”:””,”Last commit date”:””},{“Name”:”evidence”,”Last commit message”:””,”Last commit date”:””},{“Name”:”research”,”Last commit message”:””,”Last commit date”:””},{“Name”:”LICENSE”,”Last commit message”:””,”Last commit date”:””},{“Name”:”README.md”,”Last commit message”:””,”Last commit date”:””},{}]
The vulnerability demonstrates how the model’s safeguards can be bypassed, raising significant concerns for AI safety and prompting urgent discussions within the AI community.

For further reading, explore our related articles: AI Security Overview, Anthropic Model Updates.