When Your Coding Agent Runs the Attack Itself

Claude Code, GitHub Copilot, Gemini CLI — all three were compromised via manipulated GitHub issues in 2025. What makes indirect prompt injection so dangerous in AI agents, and why the answer lies in architecture, not system prompts.

The Moment Before You Notice

Imagine this: you ask your AI coding agent to review an open GitHub issue and sketch out a first fix. The agent reads the issue body — the way it always reads text. What it sees is invisible in GitHub's rendered view: an HTML comment tucked at the end of the body, carrying a second set of instructions that were never meant for you.

The agent keeps working. It modifies `.vscode/settings.json`. It sets `chat.tools.autoApprove: true`, disabling every confirmation dialog — what developers internally call "YOLO mode." Then it reads the environment, and pushes your `GITHUB_TOKEN` base64-encoded into a PR comment.

You see none of this. The agent looks normal. It was just doing its job: following the instructions that appeared in the context it processed.

Klingt interessant?

Jetzt kostenlos ausprobieren

This isn't a thought experiment. This is CVE-2025-53773.

What Makes Indirect Prompt Injection Different

Classic injection attacks — SQL, command-line — target poorly validated input. You control the entry point; you fix the validation.

Prompt injection in language models works differently at a structural level. The model can't reliably distinguish "instruction" from "content." When it processes context, it interprets everything — including content pulled from external sources. That's not a bug. That's the design.

With a simple chatbot, the blast radius is limited: an attacker can confuse the bot or make it lie. With coding agents, the situation is fundamentally different. These agents have tools — they write files, run terminal commands, call APIs, read secrets from the environment. And they constantly process external content: issues, pull request comments, README files, Stack Overflow snippets, npm package descriptions. Every one of those sources is a potential attack delivery mechanism.

This is indirect prompt injection: the attacker doesn't strike directly. They leave instructions in a document, an issue, a webpage — and wait for your agent to process that context. The weapon is the agent's own helpfulness. Palo Alto Networks Unit 42, in their current threat report, no longer treats indirect prompt injection as a theoretical risk — they document it as observed in the wild, with confirmed cases of credential theft, data destruction, and unauthorized transactions.

What Went Wrong in Real Systems

In 2025, researcher Aonan Guan and colleagues at Johns Hopkins University studied three widely used AI coding agents: Anthropic's Claude Code, Google's Gemini CLI, and GitHub Copilot Agent. All three were vulnerable to an attack technique they named "Comment and Control" — a deliberate reference to command-and-control servers from the malware world.

The mechanics: an attacker opens a GitHub issue or pull request with a manipulated title or body. Agents configured to respond automatically to such events in GitHub Actions workflows process that context without verification — treating it as a trusted instruction.

With Claude Code, a crafted PR title was enough to break out of the prompt structure. Claude then posted the `ANTHROPIC_API_KEY` and `GITHUB_TOKEN` as a JSON "security finding" into the PR comment. Anthropic rated the vulnerability at CVSS 9.4.

GitHub had built three separate defenses into its Copilot Agent: environment variable filtering, secret scanning, and a network firewall. All three were bypassed. The credentials were exfiltrated via a standard `git push` — an operation the firewall explicitly allows, because it's part of normal Copilot workflow.

Separately documented: CVE-2025-53773, GitHub Copilot in VS Code. Through a prompt injection in virtually any file the agent processes — an issue, a source file, a tool response — the agent can be made to modify its own configuration mode and subsequently execute arbitrary shell commands. Full compromise of the developer's machine.

Why a Harder System Prompt Doesn't Fix This

The obvious response is: "Then just tell the agent not to follow those instructions."

That doesn't hold structurally. Models are trained to integrate context and to be helpful — you can't override that with a single prompt line. Prompt-based defenses live in the same channel as the attack itself. An attacker who controls the context can always try to override or undermine those instructions. OWASP lists prompt injection as LLM01 — number one on their top ten risks for LLM-based applications — for exactly this reason.

The real problem isn't the model's behavior. It's what the model is allowed to do.

When an agent simultaneously processes external content, carries production credentials in its environment, and can write files without confirmation, then every successful prompt injection is a potential catastrophe. Privilege separation isn't optional here — it's a prerequisite.

Architecture as the Defense

At nopex, the gap between what an agent can do and what it's permitted to do is kept deliberately narrow. Agents run in isolated sandboxes; secrets are never injected into the agent environment but provisioned on demand through controlled gateways. What an agent is allowed to do is enforced by the infrastructure beneath it, not by prompt instructions.

An agent that wants to write code into a repository goes through a quality gate: automated checks for known vulnerability patterns and suspicious diffs, before anything is committed. For high-stakes actions — credential access, external network requests, write operations outside the project context — human-in-the-loop isn't an emergency measure. It's a built-in pipeline stage.

This means: even if a prompt injection succeeds and gets the agent to "want" to execute a malicious instruction, the architecture prevents it from actually doing so. The sandboxing breaks the kill chain before it can cause damage.

Security through prompts is like a paper lock. What matters is what the agent physically cannot do — not what it's been instructed not to want.

What This Means for Your Team

Anyone using AI agents in development workflows should ask concrete questions: what permissions does the agent actually have? Which external sources does it process without human review? Which actions require explicit confirmation — and where does whatever the agent decided just run through unchecked?

The attacks on Claude Code, Gemini CLI, and Copilot were all executed through the same surface: the agent trusted the content it read. That's not negligence on the part of the vendors — it's an inherent characteristic of how these systems are built. The solution lives one layer down.

If you want to know how nopex answers these questions architecturally, reach out — or take a look at how our agent pipelines are structured.

The Moment Before You Notice

You see none of this. The agent looks normal. It was just doing its job: following the instructions that appeared in the context it processed.

Klingt interessant?

Jetzt kostenlos ausprobieren

This isn't a thought experiment. This is CVE-2025-53773.

What Makes Indirect Prompt Injection Different

Classic injection attacks — SQL, command-line — target poorly validated input. You control the entry point; you fix the validation.

What Went Wrong in Real Systems

Why a Harder System Prompt Doesn't Fix This

The obvious response is: "Then just tell the agent not to follow those instructions."

The real problem isn't the model's behavior. It's what the model is allowed to do.

Architecture as the Defense

Security through prompts is like a paper lock. What matters is what the agent physically cannot do — not what it's been instructed not to want.

What This Means for Your Team

If you want to know how nopex answers these questions architecturally, reach out — or take a look at how our agent pipelines are structured.

When Your Coding Agent Runs the Attack Itself

The Moment Before You Notice

What Makes Indirect Prompt Injection Different

What Went Wrong in Real Systems

Why a Harder System Prompt Doesn't Fix This

Architecture as the Defense

What This Means for Your Team

Bereit, dein Projekt zu starten?

Weitere Artikel

AI Ate the Bottom Rung: Entry-Level Jobs Are Disappearing, and the Career Ladder Won't Be the Same

The AI Scissors: How the Gap Between Leaders and Laggards Is Already Opening

When Your Coding Agent Runs the Attack Itself

The Moment Before You Notice

What Makes Indirect Prompt Injection Different

What Went Wrong in Real Systems

Why a Harder System Prompt Doesn't Fix This

Architecture as the Defense

What This Means for Your Team

Bereit, dein Projekt zu starten?

Weitere Artikel

AI Ate the Bottom Rung: Entry-Level Jobs Are Disappearing, and the Career Ladder Won't Be the Same

The AI Scissors: How the Gap Between Leaders and Laggards Is Already Opening