The Collapse of the Trust Boundary: Prompt Injection and Secret Exfiltration in Claude Code

As AI agents increasingly take the driver’s seat in automated development pipelines, a new class of vulnerability is emerging: the intersection of prompt injection and insecure tool orchestration. A recent security analysis has revealed that Anthropic’s Claude Code GitHub Action could unintentionally act as a conduit for leaking sensitive CI/CD secrets when processing untrusted repository content.

The core of the issue lies in an architectural inconsistency in how the agent handles tool execution. While Anthropic implemented robust sandboxing—utilizing Bubblewrap and environment scrubbing—for subprocesses like Bash, this protection did not extend to the agent’s native “Read” tool. Because the Read tool performed in-process file reads rather than executing a sandboxed subprocess, it bypassed the security layers intended to isolate the environment.

By specifically targeting /proc/self/environ, an attacker could leverage the Read tool to ingest the entire environment block of the running process. This effectively exposes high-value credentials such as the ANTHROPIC_API_KEY and the GITHUB_TOKEN directly to the model’s context window.

HTML comment hidden inside an issue opened by the actor
Attackers can hide malicious instructions within HTML comments in GitHub issues, making them invisible to human reviewers but perfectly legible to an AI model reading raw markdown.

The Anatomy of the Attack

Researchers at Microsoft identified this risk while observing a surge in prompt injection attempts within public repositories. In these scenarios, attackers exploit the fact that AI-assisted workflows often automatically process data from “untrusted” sources, such as issue bodies, pull request descriptions, or comments.

A sophisticated attack vector involves a two-stage process:

  1. The Payload: An attacker submits a seemingly benign feature request or issue. Hidden within the markdown are instructions that direct the AI agent to execute a specific sequence of commands.
  2. The Exfiltration: Once the agent is “hijacked” via prompt injection, it is steered toward sensitive files. In one documented campaign, an attacker used a permissive workflow to have an AI bot locate a target file, append a latent XSS payload, and open a PR. If merged, this would facilitate session token theft from website visitors.
Attack flow diagram
Conceptual attack flow: Moving from untrusted input to unauthorized environment access.

When applied to the Read tool vulnerability, the logic is even more direct. An attacker can command the agent to read /proc/self/environ and then “launder” the resulting keys—perhaps by encoding them or breaking them into small chunks—to bypass standard GitHub secret scanners and safety filters before exfiltrating them via a web fetch or a simple GitHub comment.

Read tool accessing the proc filesystem
The Read tool’s ability to access the /proc filesystem allows it to bypass the subprocess sandbox.

Mitigation and the “Agents Rule of Two”

Following a disclosure via HackerOne, Anthropic released Claude Code version 2.1.128 on May 5, 2026. This update specifically blocks access to sensitive /proc files, closing the primary exfiltration path identified by Microsoft.

However, patching a single tool is not a panacea for the broader risks of agentic workflows. Security professionals should adopt a “Defense in Depth” posture by adhering to what is being called the “Agents Rule of Two.” This principle dictates that no single AI workflow should simultaneously possess these three capabilities:

  • Processing Untrusted Input: Reading issue bodies, PRs, or external web content.
  • Access to Sensitive Secrets: Holding high-privilege tokens (GITHUB_TOKEN, cloud provider keys, etc.).
  • External Communication/State Change: The ability to write to logs, make network requests, or modify the repository state.

To harden your CI/CD pipelines against future agentic vulnerabilities, consider these technical best practices:

  • Strict Token Scoping: Move away from “all-access” tokens. Use fine-grained permissions that limit an agent to the absolute minimum scope required for its task.
  • Explicit Trust Boundaries: Update system prompts to explicitly define untrusted data. The agent must be instructed to treat all repository metadata (issues, comments) as data, not as executable instructions.
  • Task Pinning: Rather than a general-purpose “coding assistant,” deploy agents pinned to a single, narrowly defined task.
  • Anomaly Monitoring: Implement logging and monitoring to detect unusual patterns, such as an agent attempting to access system files or making unexpected outbound network calls.

As we integrate more autonomous agents into the software development lifecycle, our security models must evolve from protecting against malicious code to protecting against malicious intent embedded within the very data our agents are designed to process.

Related Articles

Back to top button