Hackers exploit Claude AI to steal user data via prompt injection backdoor

Written by Tshedimoso Makhene | Nov 10, 2025 2:57:48 PM

A newly discovered exploit in Claude shows how hackers can manipulate artificial intelligence systems to steal sensitive user data. Security researcher Johann Rehberger uncovered the attack, which uses hidden instructions to turn the AI’s own tools against its users.

What happened

According to eSecurity Planet, security researcher Johann Rehberger uncovered a novel attack vector in the AI assistant Claude (by Anthropic) in which malicious actors used prompt injection to turn Claude into a data-exfiltration tool.

Going deeper

The exploit targets Claude’s Code Interpreter, which was configured with a network access setting called “Package managers only.” This feature was meant to let the AI install software from trusted sources like npm, PyPI, and GitHub, but it also created an opening for attackers.

Hackers can hide malicious code inside a file that a user uploads for analysis. When Claude processes the file, the hidden instructions tell it to:

Collect chat history or user data
Save that data to a file, such as hello.md, in its sandbox environment
Upload the file using Claude’s Anthropic SDK and the attacker’s own API key

Each upload can carry up to 30 MB of data, allowing for large-scale data theft.

Early versions of the attack worked easily. When Anthropic began blocking obvious API keys, attackers hid them inside print statements or commented code, allowing the exploit to continue undetected.

At its core, the problem is a configuration flaw. The network setting, meant for safe package installation, effectively created a backdoor that attackers could use to send data out. By embedding a hidden code, they performed an indirect prompt injection, tricking Claude into following instructions buried in user content.

Security researcher Johann Rehberger reported the issue to Anthropic via HackerOne. It was initially dismissed as a model safety concern, but on October 30, Anthropic confirmed it as a valid security vulnerability.

Experts recommend:

Limiting or disabling network access for AI tools
Tightening allowlists for trusted domains
Monitoring AI activity for unexpected file creation or uploads
Avoiding sensitive data in AI systems that can connect to the internet

In the know

Prompt injection is a type of cyberattack where malicious actors hide commands within text, code, or files to manipulate how an AI model behaves. These hidden instructions “trick” the model into ignoring its normal safeguards and performing unintended actions, such as revealing confidential information, altering responses, or accessing connected systems.

It’s similar to phishing, but instead of deceiving a person, the attacker deceives the AI itself.

Why it matters

The incident mirrors the recent EchoLeak (CVE-2025-32711) flaw in Microsoft 365 Copilot, where researchers demonstrated a zero-click prompt-injection attack that allowed an attacker, simply by sending a crafted email, to pull sensitive organisational data from Copilot’s context, without any user interaction.

The parallels are striking: a model with access to internal data + connectivity + manipulation via hidden instructions = a full data-exfiltration vector. This shows that the risk goes far beyond one product and indicates the need for stronger oversight of how AI tools are managed and secured.

FAQS

What does this incident tell us about the future of AI security?

It proves the growing overlap between AI safety and cybersecurity. As AI models gain more capabilities, security must evolve to protect not just networks and endpoints, but also the models themselves.

What can organizations do to prevent such attacks?

Organizations can:

Disable unnecessary network access
Monitor for unusual AI activity (like file creation or outbound uploads)
Tighten allow lists, and
Avoid inputting sensitive data into connected AI tools.

Can users safely continue using Claude or other AI assistants?

Yes, but with caution. Users should stick to official versions, avoid uploading confidential data, and verify that safety settings like network access and plugin permissions are limited or disabled.

View full post