Anthropic links China to AI-driven cyberattacks using Claude

Researchers warn the incident shows how advanced models can speed up complex intrusions
Anthropic has said it disrupted a “highly sophisticated” espionage operation that used its Claude artificial intelligence system to automate large parts of a global hacking campaign. The company said the attackers targeted government agencies, Big Tech firms, banks and chemical manufacturers, and succeeded in breaching a small number of organisations.
In a detailed report titled “Disrupting the first reported AI-orchestrated cyber espionage campaign” the US-based firm said the attackers relied on Claude’s ability to perform long sequences of tasks on its own. This allowed the system to carry out reconnaissance, generate exploit code and consolidate stolen data with limited human involvement. Anthropic said around 30 entities were targeted overall.
The company has linked the activity to a Chinese state-backed hacking group with what it described as “high confidence”. Beijing has denied any involvement when approached by international media outlets.
How the operation unfolded
Anthropic said the campaign began when the attackers managed to bypass Claude’s safety mechanisms. Researchers refer to this as “jailbreaking”, which involves tricking an AI model into performing a restricted action by disguising the request as a harmless or technical query.
Once the model was compromised, the operators used Claude Code, a version of the system designed for software tasks, to scan networks, identify exposed systems and map out possible weaknesses. This early stage, known as reconnaissance, is typically one of the most time-consuming parts of any intrusion attempt. According to Anthropic, Claude carried out much of this work on its own.
The attackers then instructed the model to produce exploit code. This is the type of software that targets flaws in a system in order to gain unauthorised access. After generating these tools, Claude attempted to breach selected networks and gather login details. This process, known as credential harvesting, can allow intruders to move deeper into a victim’s systems.
Anthropic said Claude also tried to help the attackers extract data, although it occasionally produced inaccurate or fabricated results. These errors are commonly referred to as hallucinations and may have limited the effectiveness of the operation.
The company has suspended the accounts involved and has alerted potential victims. It has also introduced new systems to detect similar misuse
Attribution still under scrutiny
Although Anthropic has attributed the activity to a Chinese state-linked actor, several analysts caution that the company has not released the technical evidence needed for independent verification. These details, often known as indicators of compromise, allow cybersecurity firms to check whether the same activity appears in their own network data.
According to The Guardian, the embassy in Washington declined to comment on the claims.
Experts also note that the operation appears to have been driven by a combination of human planning and automated execution. This indicates that AI is amplifying the capacity of existing threat actors rather than acting as a fully autonomous agent.
Why the case stands out and the missing details
If Anthropic’s account is accurate, it marks one of the most prominent cases so far of an AI model being used to conduct a real-world cyber operation. Anthropic has shared an extensive narrative of the attack but has not provided the underlying technical data that would allow others to fully confirm its findings. Security researchers say this could lower the barrier for complex hacking techniques, enabling smaller or less experienced groups to attempt attacks that previously required deep technical expertise.
The incident has renewed pressure on AI companies to improve monitoring and reporting practices. These include stronger access controls, clearer logging of high-risk tasks and more transparency around attempted misuse.
Anthropic has released a detailed narrative of the attack but has not provided the underlying technical data that would allow others to fully confirm its findings. It is still unknown which specific organisations were breached, how much data was taken and whether similar campaigns are being carried out through other AI systems.
Cybersecurity firms say they will need access to technical indicators to verify whether this activity overlaps with any known threat groups. Until such information is shared, the full scope of the operation remains uncertain.
