Skip to content
techpotions
AI agent skill security · AI security · agent skills · sandboxing · LLM tools · developer securityJune 24, 20262 min read

26,000 Agents Fooled by a Fake Skill

Security scanners only check what's in the package. Malicious actors are exploiting that narrow view with mutable external links that change after approval—and a single fake skill already reached 26,000 agents before anyone noticed.

26,000 agents got compromised by a fake skill that sailed through security scanners. The trick? A mutable external link that pointed to benign code during review—and malware after approval.

The incident, reported by AI Red Team (AIR), exposes a fundamental flaw in how we vet AI agent skills today.

The scan-time blind spot

Current security scanners for AI agents inspect the submitted package. That’s the problem. If your skill manifest points to a remote URL for its actual implementation—something MCP servers and custom skills routinely do—the scanner sees whatever’s there at scan time, not what loads at runtime.

AIR built a fake skill that did exactly this: a clean codebase during submission, with the real payload swapped in later. It passed multiple named scanners, picked up stars and download counts, and reached 26,000 agents before the red team pulled it down.[^1]

The scanners weren't broken. They did their job—on the package that was submitted. The gap is architectural: trust is evaluated once, at install time, while the execution surface remains dynamic.

Why traditional supply chain tooling isn’t enough

Developer tools like Snyk Agent Scan help with inventory and threat detection[^2], but they’re designed for static analysis of known vulnerabilities and prompt injection patterns. A remote-loaded module that changes after scanning bypasses that model entirely.

This isn’t a traditional supply chain attack. No package dependency was poisoned. The skill manifest itself was the attack vector—a JSON wrapper pointing to a moving target.

Sandboxing is the only real fix

If you can’t trust what loads, you have to contain it. Skills need execution boundaries that don’t depend on the code being static:

  • Read-only filesystem access by default—a skill that claims to summarize your documents shouldn’t be able to write to disk.
  • Network egress filtering—a skill loaded via external URL shouldn’t phone home to a C2 server.
  • Capability-based scoping—don’t hand every skill full API access. Grant only what the task requires.

Platforms like Mondoo’s scanner check for file permissions, command execution paths, and network access[^3]. But even those scans are point-in-time. Without a runtime sandbox, a clean scan today means nothing tomorrow.

What builders should do now

  1. Ban mutable external references in skill manifests. If a skill’s behavior can change without a version bump, it’s not a versioned release—it’s a backdoor waiting to happen.
  2. Isolate execution. Treat every skill like untrusted code, because after approval, that’s exactly what it is.
  3. Re-scan at runtime. If your platform won’t sandbox, at minimum compare runtime behavior against what was approved at install time.

The 26,000-agent experiment wasn’t a sophisticated 0-day. It was creative abuse of a trust model that assumes static code. Until agent platforms enforce execution boundaries, scanners are just a speed bump for anyone with a rewritable URL.

[^1]: Fake AI Agent Skill Passed Security Scans and Reportedly Reached 26,000 Agents – The Hacker News [^2]: GitHub – snyk/agent-scan [^3]: AI Agent Skill Security Scanner – Mondoo

Written by
techpotions
All entries
Deno Desktop: The Runtime That Eats Electron's Lunch

Got a build in mind? Tell us about it.

26,000 Agents Fooled by a Fake Skill — techpotions