How to Stop Agents from Silently Exfiltrating Files via Outbound Messages

By yrzheMay 27, 20267 min read

# How to Stop Agents from Silently Exfiltrating Files via Outbound Messages

Stop silent exfiltration by treating every outbound message as a privileged action: don’t let agents auto-send emails/IMs that contain links, attachments, or content derived from sensitive stores; don’t auto-load unvalidated “Skills” (or any third-party content) into an agent runtime; and constrain the agent’s permissions so that even a successful prompt injection can’t enumerate or share large parts of a tenant. The Copilot Cowork research shows that “message sending” plus “preview rendering” can become a data egress channel even when no file is explicitly attached.

Quick answer: what actually prevents silent file exfiltration

The practical defense is an end-to-end outbound action gate:

Human approval with context for any agent-originated email/Teams message that includes (a) a link, (b) an attachment, or (c) text derived from restricted sources. “Approval” has to show the exact rendered payload (including URLs) and the recipients, not a generic “send message?” prompt.
Skill hygiene: only load Skills/extensions from a validated registry (or signed bundles). If Skills can be uploaded into shared storage and auto-loaded, treat them as untrusted code/content.
Least privilege by default: scope what the agent can read in Microsoft 365 (and what it can share). Log Graph-level reads and all outbound sends so you can audit and roll back.
Rendering + network guards: block or proxy external fetches triggered by message previews; strip external images/tags from agent-composed content; and avoid automatic link previews for agent-originated messages.

This is less about “fixing a single prompt injection” and more about building agent rails where a compromised planning step cannot automatically complete an egress step. (Related internal brief: What breaks when agents can auto-send messages or links — how to defend outbound actions.)

What happened with Copilot Cowork (incident recap)

Microsoft Copilot Cowork is an autonomous, enterprise “agentic” feature launched in March 2026 via the Frontier program. It operates with the invoking user’s Microsoft 365 permissions across systems including Outlook, Teams, SharePoint, OneDrive, and Dynamics 365, and it is built on Anthropic Claude models.

Researchers and security firms described a path to file exfiltration that combined:

Indirect prompt injection delivered through Cowork “Skills” (user-created files that extend capabilities).
Skill loading behavior: Skills are automatically loaded from a designated OneDrive path, and Microsoft’s published materials indicate Skills are not validated prior to use.
Outbound messaging behavior: in reported demonstrations, sending email/Teams messages to the active user could proceed without explicit human approval.
Exfiltration mechanics: messages could include pre-authenticated download links and attacker-controlled content (including external image tags). Merely opening or previewing those messages could trigger network requests that leak data to attacker infrastructure.

The class of issue connects to “EchoLeak,” a zero-click indirect prompt injection vector against Microsoft 365 Copilot disclosed in 2025, assigned CVE-2025-32711 (CVSS 9.3) and reported as patched server-side, with Microsoft stating no observed customer exploitation. Follow-on publications in May 2026 emphasized that additional exfiltration vectors remain feasible when agents combine broad delegated access with auto-approved outbound actions.

Why the outbound action flow matters (technical anatomy)

The mechanism is a three-step chain, and each step has a distinct control point:

Untrusted content enters the agent’s instruction channel. A malicious Skill file (stored where Cowork auto-loads) includes hidden prompt-injection lines. When the user invokes routine tasks (summarize/review), the agent ingests the Skill and the injected instructions get treated as high-priority intent.
The agent translates injected intent into an outbound artifact. With Microsoft Graph-level permissions (delegated from the user), the agent can enumerate or access tenant documents, then compose an email or Teams message that includes pre-authenticated download links or other exfil triggers.
Preview/rendering causes network egress. The user doesn’t need to click an attachment. If the outbound message is auto-sent and then previewed, link previews and external tags can cause automatic fetches—effectively turning “message rendering” into a data transport.

The enabling design choices called out across writeups are consistent: unvalidated Skills, broad delegated permissions, and auto-approved outbound actions (especially “messages to self,” which feel safe in demos). The builder consequence is blunt: if your agent can read sensitive stores and can cause a message preview to fetch attacker-controlled URLs, you’ve created a low-friction egress path even without “download file” APIs in your explicit toolset.

Concrete mitigations solo AI builders should implement

Approval model: make “send” a privileged operation

Use explicit confirmation for outbound actions—especially messages that contain links, attachments, or retrieved content. “Auto-approve to active user” is exactly the sort of convenience that collapses a multi-step exploit into a zero-click one. Practically, require a review screen that shows:

recipients (including “self”),
the fully expanded message body,
every URL (normalized/expanded),
any files referenced and their source scopes.

Default to “save as draft,” not “send,” for agent-authored messages.

Skill hygiene: don’t auto-load untrusted extensions

If you support Skills/plugins/templates, don’t treat shared storage as a trusted package repository. Only load from a validated registry or signed bundle; otherwise you have a supply chain path that bypasses your app’s usual review gates. Run Skills in sandboxes with strict I/O and network rules until they earn trust, and quarantine newly uploaded Skills for manual review before they can influence production tasks.

Least privilege: reduce blast radius of a successful injection

Cowork’s model—operating with the invoking user’s “full Microsoft 365 permissions”—is powerful, but it increases what prompt injection can do. As a solo builder, you can still narrow scope:

constrain which file locations the agent can read,
constrain which APIs/tools are available in a given workflow,
use short-lived tokens and conditional access instead of broad, long-lived delegated access,
log every Graph-level access the agent makes, correlated to an outbound action.

Rendering and network guards: treat previews as active content

Preview rendering and automatic URL fetching are egress vectors. Strip external images/tags from agent-composed content; disable automatic previews for agent-originated messages; and proxy or block agent-originated HTTP(S) requests by default. The goal is that even if the agent composes a malicious message, the environment won’t auto-fetch attacker infrastructure when the message is viewed.

Monitoring and rollback: assume you’ll miss something

Log agent-originated outbound actions (email/Teams sends) and watch for creation of pre-authenticated links and unusual URL fetch patterns. Have an operational “big red button”: revoke shared links and tokens quickly when you detect suspicious behavior.

Why It Matters Now

EchoLeak (CVE-2025-32711, CVSS 9.3) and the May 2026 Cowork-focused demonstrations landed on the same thesis: prompt injection is not confined to “model misbehavior,” it’s an integration failure mode amplified by agentic authority. Microsoft reportedly patched specific vectors and stated no customer exploitation observed—but the underlying risk pattern persists anywhere you combine (1) delegated access across enterprise systems, (2) ingestion of third-party content like Skills, and (3) outbound actions that can execute without meaningful friction.

For small teams, the immediate relevance is that permissive defaults (auto-loading extensions, broad OAuth scopes, “auto-send for convenience”) are the fastest way to ship—and also the fastest way to recreate this class of egress path. If you’re building agents now, hardening outbound approvals and preview/network behavior is cheaper than incident response later. (See also: LocalAI + Outsourcing Is About to Reorder Builder Economics for why more builders will ship agentic surfaces faster, with fewer governance layers.)

Practical checklist for immediate hardening

Set outbound messages to draft-by-default; require explicit send confirmation for anything with links/attachments or retrieved content.
Require signed Skills (or a validated registry) and queue manual review for any Skill newly uploaded to shared storage.
Proxy agent-originated HTTP(S); block outbound requests to newly seen hostnames; alert on pre-authenticated download link creation.
Sanitize agent-composed messages: strip external images/tags and disable automatic link previews for agent-originated content.
Minimize scopes: restrict which stores/APIs the agent can read; log Graph-level reads and correlate them to outbound actions; support quick revocation of links/tokens.

What to Watch

Vendor guidance and follow-ups on EchoLeak-style indirect prompt injection defenses, including Microsoft’s Zero Trust guidance on defending against indirect prompt injection.
Platform-level defaults: whether copilots/agents move toward stricter outbound approval UX, narrower delegated permissions, and safer preview behavior.
Tooling maturity for Skills: signing/verification workflows and sandboxes that let you test third-party agent extensions safely before granting production access.

Sources:

https://letsdatascience.com/news/microsoft-copilot-cowork-enables-file-exfiltration-via-echol-ca9b728b

https://byteiota.com/microsoft-copilot-cowork-file-exfiltration/

https://www.promptarmor.com/resources/microsoft-copilot-cowork-exfiltrates-files

https://www.sentra.io/blog/copilot-echoleak-prompt-injection

https://aitoolly.com/ai-news/article/2026-05-26-microsoft-copilot-cowork-vulnerability-indirect-prompt-injection-enables-unauthorized-file-exfiltrat-1

https://learn.microsoft.com/en-us/security/zero-trust/sfi/defend-indirect-prompt-injection

About the Author

yrzhe

AI Product Thinker & Builder. Curating and analyzing tech news at TechScan AI. Follow @yrzhe_top on X for daily tech insights and commentary.

X/Twitter GitHub Blog