A newly discovered exploit in Claude shows how hackers can manipulate artificial intelligence systems to steal sensitive user data. Security researcher Johann Rehberger uncovered the attack, which uses hidden instructions to turn the AI’s own tools against its users.
According to eSecurity Planet, security researcher Johann Rehberger uncovered a novel attack vector in the AI assistant Claude (by Anthropic) in which malicious actors used prompt injection to turn Claude into a data-exfiltration tool.
The exploit targets Claude’s Code Interpreter, which was configured with a network access setting called “Package managers only.” This feature was meant to let the AI install software from trusted sources like npm, PyPI, and GitHub, but it also created an opening for attackers.
Hackers can hide malicious code inside a file that a user uploads for analysis. When Claude processes the file, the hidden instructions tell it to:
Each upload can carry up to 30 MB of data, allowing for large-scale data theft.
Early versions of the attack worked easily. When Anthropic began blocking obvious API keys, attackers hid them inside print statements or commented code, allowing the exploit to continue undetected.
At its core, the problem is a configuration flaw. The network setting, meant for safe package installation, effectively created a backdoor that attackers could use to send data out. By embedding a hidden code, they performed an indirect prompt injection, tricking Claude into following instructions buried in user content.
Security researcher Johann Rehberger reported the issue to Anthropic via HackerOne. It was initially dismissed as a model safety concern, but on October 30, Anthropic confirmed it as a valid security vulnerability.
Experts recommend:
Prompt injection is a type of cyberattack where malicious actors hide commands within text, code, or files to manipulate how an AI model behaves. These hidden instructions “trick” the model into ignoring its normal safeguards and performing unintended actions, such as revealing confidential information, altering responses, or accessing connected systems.
It’s similar to phishing, but instead of deceiving a person, the attacker deceives the AI itself.
See also: HIPAA Compliant Email: The Definitive Guide (2025 Update)
The incident mirrors the recent EchoLeak (CVE-2025-32711) flaw in Microsoft 365 Copilot, where researchers demonstrated a zero-click prompt-injection attack that allowed an attacker, simply by sending a crafted email, to pull sensitive organisational data from Copilot’s context, without any user interaction.
The parallels are striking: a model with access to internal data + connectivity + manipulation via hidden instructions = a full data-exfiltration vector. This shows that the risk goes far beyond one product and indicates the need for stronger oversight of how AI tools are managed and secured.
Read more: Zero-Click AI Vulnerability Exposes Microsoft 365 Copilot Data Without User Interaction
It proves the growing overlap between AI safety and cybersecurity. As AI models gain more capabilities, security must evolve to protect not just networks and endpoints, but also the models themselves.
Organizations can:
Yes, but with caution. Users should stick to official versions, avoid uploading confidential data, and verify that safety settings like network access and plugin permissions are limited or disabled.