The Fragility of Agentic AI Defense: Bypassing Modern Skill Security Scanners

As the ecosystem for agentic AI matures, a new supply chain vulnerability has emerged: the “skill.” Recent security research has revealed that current AI skill security scanners—including those from ClawHub, Cisco, and Vercel’s skills.sh—can be reliably bypassed using relatively simple evasion techniques. These findings raise urgent questions regarding the robustness of defenses protecting the agentic AI supply chain.

In a series of rigorous tests conducted by Trail of Bits, researchers successfully uploaded multiple malicious skills designed to exfiltrate sensitive data, hijack autonomous agents, or execute arbitrary code. Despite their malicious intent, every evaluated scanner flagged these payloads as “safe.”

The research specifically targeted the scanning pipelines of ClawHub (which utilizes a VirusTotal-backed architecture), Cisco’s open-source skill-scanner, and the various security integrations—including Gen, Socket, and Snyk—that power the skills.sh platform.

The researchers argue that the vulnerability is structural rather than a mere failure of individual signatures. While public skill marketplaces are beginning to resemble traditional package ecosystems (like npm or PyPI), they present a significantly broader and more complex attack surface. This surface combines traditional code, configuration files, and natural-language instructions, creating a multi-dimensional vector for exploitation.

OpenClaw scanner misses malicious content
OpenClaw scanner misses malicious content (Source: Trail of Bits)

Attackers can hide malicious logic within SKILL.MD instructions, embedded scripts, binary artifacts, or even multimodal content. This makes both static analysis and LLM-based classification highly fragile. Because these scanners function as static targets, adversaries can use an iterative “test-and-refine” loop, adjusting their payloads until they find a variant that successfully evades detection.

Exploiting Truncation and Context Windows

On the ClawHub platform, Trail of Bits demonstrated a trivial bypass by prepending approximately 100,000 newline characters to a skill file before the malicious logic. Because the OpenClaw scanning harness truncates oversized content to manage context window limits and reduce latency/cost, the malicious section was effectively “pushed” outside the region inspected by the guard model.

Furthermore, the integrated VirusTotal and Code Insight path—which leverages Gemini 3 Flash—failed to identify the skill as a threat. This highlights a critical flaw: the optimization strategies used to control LLM context windows (truncation) inadvertently create massive blind spots for attackers to exploit.

Indirection and Bytecode Obfuscation

When testing skills.sh and Cisco’s scanner, researchers moved toward more sophisticated techniques involving arbitrary Git repositories and binary payloads. One “context-loader” skill utilized SKILL.MD to instruct an agent to read instructions from a .docx file. In reality, this document was a ZIP archive containing XML and a hidden shell script payload. The scanners treated the document as an opaque object and failed to recognize the malicious indirection.

In another instance, a “simple-formatter” skill provided seemingly benign Python source code but included a poisoned .pyc file. This compiled bytecode was designed to harvest environment variables for later exfiltration. Neither the skills.sh integrations nor Cisco’s engines performed deep enough inspection of the bytecode to detect the discrepancy between the visible source code and the actual compiled behavior.

Platform / Tool Scanning Approach Bypass Technique Outcome
ClawHub (OpenClaw) VirusTotal + LLM-based guard model (with file truncation) Prepending ~100k newlines to bypass truncation limits Malicious content ignored; passed as safe
Vercel skills.sh Gen Agent Trust Hub, Socket, and Snyk integrations Hidden logic in .docx ZIP payloads and poisoned .pyc bytecode All scanners reported skills as safe
Cisco skill-scanner Multi-engine static analysis + LLM analyzer Obfuscating .pyc internals and using “corporate registry” prompt framing Marked malicious skills as low risk/safe

Prompt Injection and Semantic Manipulation

The researchers also demonstrated a prompt-injection attack within a “dev-env-setup” skill. This skill was designed to reconfigure npm and yarn to use an attacker-controlled registry, cleverly framed as an “internal corporate mirror.”

By carefully crafting the SKILL.md and inline comments to mimic standard enterprise tooling and AppSec-approved configurations, the researchers nudged the LLM analyzers into treating the change as a low-risk administrative task. Even when backed by a sophisticated model like Claude Sonnet 4.6, Cisco’s scanner surfaced only low-severity findings, focusing on the exposure of “internal” URLs rather than recognizing the registry itself as a malicious control point.

While Cisco has since accepted a pull request to include stricter spec validation and broader language support (JavaScript/TypeScript), these updates do not solve the underlying issues of prompt injection or the complexity of binary and multi-format artifacts.

Passing scan results on skills.sh
The passing scan results on skills.sh (Source: Trail of Bits)

Recommendations for Secure Deployment

A notable complication is that even legitimate, high-trust skills can trigger suspicion. For example, Anthropic’s MS Office skills use LD_PRELOAD techniques to bypass sandbox socket restrictions—a behavior that current scanners struggle to classify without flagging it as a threat.

The researchers conclude that organizations cannot afford to outsource trust to automated scanners or public marketplaces. To mitigate these risks, they recommend the following security posture:

  • Curated Internal Registries: Maintain a private, vetted repository of approved skills.
  • Strict Access Control: Limit who has the authority to introduce or update skills within the organization.
  • Version Pinning: Avoid “latest” tags; always use specific, audited versions of skills.
  • Zero Trust Defaults: Treat every skill from a public hub as untrusted, executable code by default.

Until the ecosystem reaches a higher level of maturity, agent operators should maintain a minimal attack surface and avoid “one-click” installations of public skills in sensitive or production environments.

Related Articles

Back to top button