Longer AI Reasoning Makes Models Easier to Jailbreak, Study Finds

Longer AI Reasoning Makes Models More Vulnerable to Jailbreaks, Researchers Warn

A new joint study by Anthropic, Stanford University, and the University of Oxford challenges one of the core assumptions in modern AI safety: that extending a model’s reasoning time makes it harder to exploit. According to the researchers, the opposite may be true.

Why Longer “Thinking” Was Believed to Be Safer

Many AI developers assumed that giving a model more time to reason would strengthen its internal safeguards. The idea was simple: more computation means more opportunities to detect malicious prompts and prevent harmful outputs. This belief influenced the design of many recent “slow thinking” and “deliberation” systems used in advanced models.

A Stable Jailbreak Hidden Inside the Chain of Thought

The research team discovered a surprising and concerning flaw. When models generate long chains of reasoning, a specific form of jailbreak can be embedded directly into that chain. Because the malicious instruction becomes part of the model’s internal reasoning process, it can bypass every safety layer that checks only the final output.

Instead of trying to trick the model with a direct harmful request, the attacker inserts instructions that modify the model’s own internal thought process. Once the chain is compromised, the model may generate prohibited content such as weapon-building guides, malware instructions, or detailed plans for illegal activities.

Why This Vulnerability Works

The researchers explain that long chains of thought increase the “surface area” for attacks. Each additional reasoning step becomes an opportunity for a malicious payload to take hold. Once embedded, the jailbreak remains stable throughout the entire reasoning sequence, even when models attempt to self-correct or apply safety filters.

The vulnerability affects multiple architectures and is not limited to a specific vendor or model type. It appears to be a structural weakness in how today’s reasoning systems are implemented.

Implications for AI Safety and Regulation

The findings raise serious questions about current approaches to AI alignment. If longer reasoning can make models less safe rather than more secure, developers may need to rethink how chain-of-thought methods are exposed and validated.

The study suggests that future AI systems will require more robust internal monitoring, sandboxed reasoning environments, or alternative safety mechanisms that cannot be hijacked by hidden instructions. Without these protections, advanced reasoning features may become one of the biggest attack vectors for high-capability models.

Editorial Team — CoinBotLab

Search

Longer AI Reasoning Makes Models Easier to Jailbreak, Study Finds

Longer AI Reasoning Makes Models More Vulnerable to Jailbreaks, Researchers Warn

Why Longer “Thinking” Was Believed to Be Safer

A Stable Jailbreak Hidden Inside the Chain of Thought

Why This Vulnerability Works

Implications for AI Safety and Regulation

Source: Anthropic

Related topics

More in AI & Technology — News, Trends & Innovations

Anthropic Stops First AI-Driven Cyber Espionage Operation GTG-1002

Three Tests to Tell Real AI From a Marketing Toy

XPeng Motors Opens World’s First Factory for Mass-Produced Flying Cars

Medivis Unveils AR System Turning MRI and CT Scans Into 3D Holograms

OpenAI’s AI Targets Could Consume as Much Power as India

Google DeepMind Unveils SIMA 2, a Multimodal AI Agent for Games

Comments

Information

Latest news

More by Coinbotlab

Share this news story

Longer AI Reasoning Makes Models Easier to Jailbreak, Study Finds

​

Longer AI Reasoning Makes Models More Vulnerable to Jailbreaks, Researchers Warn​

Why Longer “Thinking” Was Believed to Be Safer​

A Stable Jailbreak Hidden Inside the Chain of Thought​

Why This Vulnerability Works​

Implications for AI Safety and Regulation​

Source: Anthropic​

Related topics

Comments

Information

Share this news story

Longer AI Reasoning Makes Models More Vulnerable to Jailbreaks, Researchers Warn

Why Longer “Thinking” Was Believed to Be Safer

A Stable Jailbreak Hidden Inside the Chain of Thought

Why This Vulnerability Works

Implications for AI Safety and Regulation

Source: Anthropic