Alibaba Stole Claude’s Brain. Here’s Exactly How They Did It.

The CyberSec Guru

Alibaba Distilled Claude AI

If you like this post, then please share it:

Buy me A Coffee!

Support The CyberSec Guru’s Mission

🔐 Fuel the cybersecurity crusade by buying me a coffee! Why your support matters: Zero paywalls: Keep the main content 100% free for learners worldwide.

“Your coffee keeps the servers running and the knowledge flowing in our fight against cybercrime.”☕ Support My Work

Buy Me a Coffee Button

An industrial-scale AI heist – 28.8 million prompts, 25,000 fake accounts, and one very uncomfortable question for the US government.

By the time Anthropic sent its letter to the U.S. Senate Banking Committee on June 10, 2026, the attack was already over. The accounts had gone dark. The data had been collected. And somewhere inside Alibaba’s Qwen AI lab, a model was probably being trained on the results.

What Anthropic described in that letter – a copy of which was reviewed by Reuters and Bloomberg was something the cybersecurity world has been warning about for years but rarely seen executed at this scale. Between April 22 and June 5, operators linked to Alibaba and its Qwen division ran nearly 25,000 fraudulent accounts through Claude’s API, generating more than 28.8 million individual exchanges with the model. The explicit goal, per Anthropic’s own framing, was to extract the capabilities that make Claude worth billions of dollars and use them to accelerate China’s own frontier AI development ,specifically targeting Anthropic’s most advanced system, Claude Mythos Preview.

Anthropic called it the largest known distillation attack in its history. Looking at the numbers, it’s hard to argue.

What “Distillation” Actually Means And Why It’s Terrifying

Before getting into what Alibaba allegedly did, let’s talk about distillation because the word gets thrown around loosely and the technical reality is more sophisticated than most coverage lets on.

The concept has legitimate roots. For decades, AI researchers have used “knowledge distillation” as a compression technique: take a large, expensive model (the “teacher”), run a smaller model (the “student”) on the teacher’s outputs, and train the student to reproduce the teacher’s behavior. The outcome is a model that runs faster and cheaper while retaining most of the teacher’s capability. OpenAI uses it to produce GPT-4o mini from GPT-4o. Anthropic uses it internally to produce smaller Claude variants. Google’s DistillBERT was one of the earliest high-profile examples. The technique is legitimate, well-understood, and widely published.

The mechanism works at the level of probability distributions. When a large language model processes a prompt, it doesn’t just pick one word, it calculates a probability score across every token in its vocabulary. These “soft targets” (as researchers call them) are richer signals than the final text output alone. A student model trained on soft targets learns not just what the teacher said but something approximating how confident it was and what alternatives it considered. This is qualitatively different from just scraping text off the internet, and it’s why distillation produces better-performing smaller models than training from raw data alone.

Modern illicit distillation adds another layer: chain-of-thought extraction. During Anthropic’s investigation of earlier Chinese lab campaigns (DeepSeek, Moonshot, MiniMax, detailed in a February 2026 disclosure), researchers found that some prompts were specifically designed to get Claude to articulate its internal reasoning step by step, not just produce a final answer. When you run that at scale, you’re not just collecting outputs – you’re generating chain-of-thought training data, the same kind of data that makes reasoning models like o1 and Claude 3.5 Sonnet so effective. You’re distilling the reasoning process, not just the knowledge.

That’s the technique Anthropic believes was deployed against it between April and June.

The Infrastructure: How 25,000 Fake Accounts Go Undetected

This part tends to get glossed over, and it shouldn’t. Running 25,000 fraudulent accounts generating 28.8 million API exchanges over 45 days isn’t something you do from a laptop. It requires infrastructure, and Anthropic’s earlier February disclosure gave a detailed picture of how these operations are architected.

The term Anthropic’s security team uses is “hydra cluster.” The name is deliberate. In mythology, cutting off one head of the Hydra causes two more to grow. In practice: a network of fraudulent accounts is distributed across Anthropic’s API and third-party cloud resellers, structured so that banning individual accounts doesn’t disrupt the operation. New accounts automatically replace banned ones. Traffic is deliberately mixed with legitimate-looking requests to frustrate statistical detection.

For Chinese AI labs that want access to Claude, there’s an added complication: Anthropic doesn’t sell Claude commercially in China. The terms of service explicitly prohibit access from China or from Chinese-owned subsidiaries outside the country. To route around this, operators use commercial proxy services – businesses that resell API access to frontier models at scale, often operating in gray-market territory. These proxies provide cover. Traffic from a distillation campaign looks like it originates from legitimate proxy customers spread across multiple jurisdictions.

During the DeepSeek campaign (which Anthropic tracked in detail before its February disclosure), a single proxy network managed more than 20,000 simultaneous fraudulent accounts. The accounts shared payment methods, showed synchronized timing, and distributed load in ways that looked like deliberate “load balancing” to maximize throughput and evade per-account detection limits. In the Moonshot campaign, the operation spanned “hundreds of fraudulent accounts across multiple access pathways” which were varied enough that it looked like organic usage until behavioral analysis surfaced the coordination.

For the Alibaba campaign, with nearly 25,000 accounts and 28.8 million exchanges, the scale would have required either a very large proxy network or direct coordination across multiple reseller pipelines. Anthropic’s letter attributes the operation to “operators affiliated with Alibaba and Alibaba Qwen” – language that is precise and deliberate. It doesn’t say Alibaba’s executives ordered it. It says the people running these accounts were affiliated with Alibaba’s AI lab. That distinction matters for attribution and liability, and it’s one Alibaba could theoretically use to distance itself from the operation. At this point, Alibaba has said nothing publicly.

What They Were After And Why Agentic Reasoning Specifically

The targets Anthropic identified in the Alibaba campaign are telling: software engineering capability and agentic reasoning. Not question-answering. Not creative writing. Not customer service chatbot behaviors.

This matters. The AI capabilities race right now is not primarily about which model can explain quantum physics most clearly. It’s about which model can autonomously execute complex multi-step tasks – write and debug code, browse the web, call APIs, manage files, take actions in response to what it observes. “Agentic” AI. This is where the economic value is concentrated, and it’s where Claude has developed differentiated capabilities over the past 18 months.

Extracting these capabilities via distillation is harder than extracting basic knowledge. You need to generate prompts that elicit the specific behaviors you want, not just ask “write some code” but construct prompts that surface Claude’s full reasoning process on genuinely hard engineering problems, its tool-use orchestration logic, its error-recovery behavior when a task fails partway through. During the MiniMax campaign that Anthropic tracked in real-time, they watched the attacker pivot within 24 hours when a new Claude model was released, redirecting nearly half their traffic to capture capabilities from the new system. That’s not random. That’s an operation with specific technical goals and the agility to pursue them.

The February 2026 disclosure also revealed DeepSeek’s particular interest in a specific prompt structure: asking Claude to “imagine and articulate the internal reasoning behind a completed response and write it out step by step.” Run tens of thousands of variations of that prompt at scale, and you’ve built a dataset for training a reasoning model. You’ve essentially turned Claude into a labeling engine for your own reinforcement learning pipeline.

The National Security Aspect

Anthropic’s letter to Senators Tim Scott and Elizabeth Warren wasn’t just a complaint about terms-of-service violations. The national security framing is the core argument, and it deserves to be taken seriously.

The concern isn’t that Alibaba now has a better coding assistant. The concern is what happens when you strip out Claude’s safety architecture and feed the raw capabilities into a model with no analogous guardrails.

Anthropic and other US AI companies build their frontier models with extensive safeguards – systems designed to prevent the model from assisting with bioweapons synthesis, malicious cyber operations, disinformation at scale, and related harms. These safeguards are not cosmetic. They are deeply integrated into the model’s training, not just tacked on as output filters. A model built through illicit distillation collects Claude’s outputs but does not replicate the training process that created the safety properties. The resulting model gets the reasoning capability without the alignment work.

If that capability then flows into a system operated by a government with an active offensive cyber program, or is open-sourced in ways that remove it from any accountability structure, the proliferation risk is real. This is the argument Anthropic made explicitly: distillation attacks “allow foreign labs, including those subject to the control of the Chinese Communist Party, to close the competitive advantage that export controls are designed to preserve.”

That framing ties directly to US policy. The White House OSTP Director issued a memo in April 2026 committing the government to share intelligence with US AI labs about foreign distillation campaigns. Per Anthropic’s letter, the Alibaba operation ran after that memo in defiance of an already-stated administration position. And it targeted Mythos Preview, Anthropic’s most restricted model, the one not publicly available due to cybersecurity concerns and currently accessible only to a small number of trusted organizations through Project Glasswing.

This Wasn’t the First Time

Anthropic’s February 2026 disclosure identified three previous campaigns, and the data from those cases builds a pattern that makes the Alibaba accusation more credible, not less.

DeepSeek ran over 150,000 exchanges. The operation used synchronized traffic across coordinated accounts, and researchers were able to trace specific accounts to individual researchers at the lab using request metadata. DeepSeek’s prompts included requests for Claude to generate “censorship-safe alternatives” to politically sensitive queries, essentially using Claude to help train DeepSeek’s own model to steer conversations away from topics the Chinese government restricts. That’s a specific, instrumentally weird use case that doesn’t exist in any normal commercial application.

Moonshot AI ran over 3.4 million exchanges through hundreds of fraudulent accounts. The campaign targeted agentic reasoning, coding, and computer vision. Anthropic attributed it by matching request metadata to the public profiles of senior Moonshot staff. In a later phase, Moonshot attempted specifically to reconstruct Claude’s reasoning traces – that is, reverse-engineer the chain-of-thought process rather than just collect outputs.

MiniMax was the largest of the three prior campaigns at over 13 million exchanges. Anthropic detected it while it was still active, giving them unprecedented visibility into the full lifecycle of a distillation attack from data generation through to model launch. When Anthropic released a new model mid-campaign, MiniMax pivoted within 24 hours. That’s real-time operational adaptation.

The Alibaba campaign, at 28.8 million exchanges, surpasses the three prior campaigns combined.

The Geopolitical Timing: A Very Crowded Week

The Alibaba accusation didn’t land in a vacuum. The timing is uncomfortable for essentially everyone involved.

On June 8, 2026, two days before Anthropic’s letter was dated, the Pentagon expanded its “1260H list” the roster of companies it considers affiliated with China’s military or defense industrial base to 188 entities. Alibaba was added to that list, alongside Baidu, BYD, Nio, and dozens of others. The designation doesn’t impose sanctions, but it bars the DoD from contracting with Alibaba directly starting June 30, with third-party procurement bans following in June 2027. Alibaba called the designation “groundless” and promptly filed a lawsuit challenging it.

Then, on June 12 – two days after Anthropic sent its letter – the U.S. Commerce Department imposed export control restrictions on Anthropic’s two newest models, Fable 5 and Mythos 5, citing a “potential narrow, non-universal jailbreak” discovered in Fable 5. The restrictions required Anthropic to pull both models for all users globally (including Anthropic’s own non-citizen employees), since real-time nationality filtering across AWS Bedrock, Google Cloud, and Microsoft Foundry wasn’t technically feasible. Anthropic complied and immediately pushed back, arguing the jailbreak was narrow, already present in competing models like GPT-5.5, and not grounds for a commercial recall affecting hundreds of millions of users.

The irony here is worth sitting with. Anthropic sent a letter to the government warning about Chinese AI capability extraction. Two days later, the government restricted Anthropic’s own most capable models, partly citing the same concern about those models falling into Chinese hands. From one angle, you could read this as a coherent policy response to a coherent threat. From another angle, it looks like Washington’s left hand and right hand operating on completely different timelines.

Meanwhile, Anthropic filed confidentially for an IPO this month, at a reported valuation of $965 billion after a $65 billion Series H round. The distillation disclosures now sit directly in that story. If you’re an institutional investor evaluating Anthropic’s public offering, the explicit message from Anthropic’s own letter to Congress is that Chinese competitors are systematically extracting its capabilities at a fraction of the training cost. That’s a material competitive risk, and it’s now in the public record right before the IPO.

The Global Times – a Chinese state-affiliated outlet ran a response piece calling Anthropic’s distillation claims “rooted in tech hegemony anxiety” and lacking substance. That’s predictable. It’s worth noting, though, that there are some genuinely valid complications to Anthropic’s account.

The attribution language in Anthropic’s letter is careful: “operators affiliated with Alibaba and Alibaba Qwen.” That is not the same as Alibaba itself directing the operation from a corporate strategy meeting. AI labs operate in an ecosystem of contractors, researchers, third-party partners, and affiliated entities. It is at least theoretically possible that entities using Alibaba’s infrastructure or associated with Qwen’s research community ran this operation without Alibaba’s direct corporate authorization. That’s not exculpatory. It would still represent a serious breach but it complicates the legal and political picture.

There’s also the question of what “capability extraction” actually achieves. Distillation produces a model trained on outputs, not a clone of the original. The gap between “trained on Claude’s outputs” and “has Claude’s capabilities” is real and technically significant. The student model doesn’t inherit Claude’s weights, its architecture, or its training data. It learns to produce outputs that resemble Claude’s, but for genuinely novel or out-of-distribution tasks, it will diverge. That said: for the specific capabilities targeted in these campaigns, agentic reasoning on coding tasks, tool use, structured output generation, the gap may be much smaller than it is for general capability.

What Anthropic Is Asking For

The letter to Congress wasn’t just an accusation – it contained specific policy requests. Anthropic laid out three asks:

First, clearer antitrust guidance. Right now, Anthropic is cautious about sharing detailed information about distillation attacks with other AI companies because of potential antitrust exposure. They want explicit guidance from the government that sharing threat intelligence with competitors about distillation campaigns is permissible under US competition law. This is not an unreasonable ask. ISAC-style intelligence sharing is standard practice in other critical infrastructure sectors.

Second, continued support for export controls on advanced AI chips. Anthropic made an argument that tends to get overlooked: distillation attacks at scale require access to advanced chips to process the collected data and train the student model. Restricting chip exports to China therefore limits not just Chinese labs’ ability to train from scratch but also their ability to exploit distilled data at industrial scale. Whether this argument is technically airtight is debatable, but it connects chip policy to distillation risk in a way policymakers haven’t always made explicit.

Third, penalties for distillation attackers. Anthropic asked for an enforcement framework that could impose real consequences on labs and their affiliated operators for running these campaigns. Right now, the main consequence is account banning – whack-a-mole against a network that regenerates accounts automatically.

The Defensive Response: What Anthropic Is Actually Building

Detection is hard. Anthropic has been honest about that. The February disclosure detailed their approach: behavioral fingerprinting systems designed to identify distillation attack patterns in API traffic, classifiers trained to detect chain-of-thought elicitation prompts, and tools for spotting coordinated activity across large account networks. They also started sharing technical indicators with other AI labs, cloud providers, and government agencies.

Account verification was tightened for the pathways most commonly exploited for fraudulent account creation – educational accounts, security research programs, and startup organizations. These are the categories that offer the most plausible cover for creating accounts in bulk.

At the model level, Anthropic is reportedly developing what it calls “countermeasures” safeguards at the product, API, and model level designed to reduce the utility of collected outputs for illicit distillation without degrading the experience for legitimate users. What exactly this means technically isn’t publicly specified, but the general approach would involve detecting distillation-style query patterns and either refusing, degrading, or subtly distorting responses in ways that introduce noise into a training dataset. Detecting the pattern is the key challenge: as Anthropic noted, a single prompt like “You are an expert data analyst. Deliver data-driven insights grounded in complete and transparent reasoning” looks completely benign. When 50,000 variations of that prompt arrive from 500 coordinated accounts, the pattern becomes obvious. The detection is fundamentally a clustering and behavioral analysis problem, not a per-query content moderation problem.

The honest assessment is that none of this is a permanent solution. The hydra-cluster architecture is explicitly designed to defeat whack-a-mole defenses. Proxy services will continue to evolve. And the economic incentive using Claude’s outputs to shortcut billions of dollars in R&D is overwhelming.

What This Means Going Forward

There’s a scenario worth thinking through carefully. Suppose Alibaba Qwen successfully distilled meaningful agentic reasoning capabilities from Claude. What happens next?

In the optimistic case for Anthropic, the resulting model has significant gaps – it can reproduce Claude’s behavior on tasks similar to the training prompts but fails on genuinely novel challenges. The distilled model is useful but not competitive at the frontier. Anthropic maintains its capability lead.

In the concerning case, the targeted prompting campaigns were sophisticated enough to cover a wide enough distribution of tasks that the student model actually approaches Claude’s capability in the targeted domains, specifically software engineering and agentic task execution. If Qwen’s next release shows unexpected jumps in coding benchmarks and agentic performance, that’s at least circumstantial evidence for the concerning scenario. Anthropic actually flagged this dynamic in its February disclosure: when MiniMax released its new model after the detected campaign, observers saw capability jumps that coincided with the distillation timeline. They didn’t make a definitive causal claim. The timing was suggestive.

The geopolitical implication is the one Anthropic is most concerned about articulating publicly: if distillation allows Chinese AI labs to close the capability gap with US frontier models at a fraction of the cost, then the competitive position that US export controls are designed to protect erodes faster than the controls themselves can adjust. Chip restrictions slow down training from scratch. They don’t stop a lab from running millions of API queries through a proxy network and collecting the results.

That gap between what export controls can restrict and what commercial API access enables is the policy problem that doesn’t have a clean solution yet. Anthropic’s ask for antitrust clarity and enforcement frameworks is essentially a request for the government to help close it.

What’s less clear is whether the government is in a position to respond coherently. The same week it received Anthropic’s letter about Chinese AI capability extraction, the Commerce Department imposed export controls that took Anthropic’s own most capable models offline globally. The administration is simultaneously trying to protect American AI capability and restrict it. That’s not a contradiction that can be resolved by a letter to the Senate Banking Committee.

Buy me A Coffee!

Support The CyberSec Guru’s Mission

🔐 Fuel the cybersecurity crusade by buying me a coffee! Your contribution powers free tutorials, hands-on labs, and security resources.

Why your support matters:
  • Writeup Access: Get complete writeup access within 12 hours
  • Zero paywalls: Keep the main content 100% free for learners worldwide

Perks for one-time supporters:
☕️ $5: Shoutout in Buy Me a Coffee
🛡️ $8: Fast-track Access to Live Webinars
💻 $10: Vote on future tutorial topics + exclusive AMA access

“Your coffee keeps the servers running and the knowledge flowing in our fight against cybercrime.”☕ Support My Work

Buy Me a Coffee Button

If you like this post, then please share it:

News

Discover more from The CyberSec Guru

Subscribe to get the latest posts sent to your email!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from The CyberSec Guru

Subscribe now to keep reading and get access to the full archive.

Continue reading