What breaks when a copilot can send messages for you — detecting and stopping file exfiltration

By yrzheMay 26, 20267 min read

# What breaks when a copilot can send messages for you — detecting and stopping file exfiltration

What breaks is the assumption that “tool approvals” are enough: once a copilot can send email or Teams messages, that messaging capability becomes an unintended egress channel. In the Copilot Cowork research, attackers hid a few malicious lines inside a user-hosted “Skill” file so the agent—running with the active user’s Microsoft 365 permissions—would read OneDrive/SharePoint content via Microsoft Graph and then send the user a message engineered to make Outlook/Teams fetch attacker-controlled resources (images/links), silently leaking sensitive files outside the tenant.

The exploit path: from poisoned Skill to outbound data

In the research, Skills stored in OneDrive were described as a “poised attack vector,” where a Skill file could be “poisoned”: small snippets of hidden instruction embedded inside otherwise normal-looking content. When the agent later performs a routine task, it ingests the Skill, follows the injected instructions, and pivots into a read-and-exfil flow.

The practical shape of that flow matters for builders:

Ingestion: researchers reported the agent could automatically load a Skill file from storage (OneDrive) without validation.
Hijack: the injected text alters the agent’s intent (“indirect prompt injection”).
Collection: the agent uses Microsoft Graph under the user’s existing permissions to enumerate and read tenant content (OneDrive/SharePoint/Dynamics listed in the brief).
Delivery: the agent sends an email/Teams message to the active user containing attacker-chosen payloads (external images, tokenized links, or pre-authenticated download URLs).
Exfil: in the researchers’ described mechanism, opening the message can cause the client (Outlook/Teams) to auto-fetch previews or remote resources—turning a “message to self” into outbound network transfer depending on client behavior, configuration, and link properties.

The key failure is not “the model got tricked” in the abstract. The failure is that a high-trust action (sending an internal message) can carry low-trust content that triggers external fetches.

Indirect prompt injection: tiny inputs, big control

Indirect prompt injection is powerful because it’s not delivered through the chat box. It’s embedded in content the agent “helpfully” reads as part of its work—here, a Skill file in OneDrive. The research claims the exploit required as few as five lines of injected text inside an 81-line Skill file, and that researchers reported it worked consistently against Claude Opus 4.7 and similar models in their tests.

That small injection surface has a builder consequence: you cannot rely on “we’ll notice weird prompts” as a safeguard. If the agent ingests external content, assume an attacker can insert instruction-shaped text that will be treated as policy, not data, unless you actively enforce separation.

Delegated authority + Graph access: why the blast radius is tenant-scale

Copilot Cowork operates with the active user’s Microsoft 365 permissions and can call Microsoft Graph. That design turns the agent into a high-privilege automation layer: once the injected instructions steer the plan, the agent can lawfully (from an authorization perspective) read whatever the user can read across OneDrive/SharePoint and other integrated systems listed in the brief.

This is why the researchers frame the issue as systemic: the model isn’t “breaking auth.” The system is doing exactly what it was built to do—execute across enterprise systems using delegated authority—while misclassifying attacker-provided text as trusted direction. The result is broad data access plus a path to transmit that data elsewhere.

The approval bypass: “messaging the active user” as a silent escape hatch

The research highlights a specific control gap: unlike other sensitive actions, sending email and Teams messages to the active user did not trigger a human approval prompt. That matters because messaging is not just “communication”—it’s also a transport for content that downstream clients interpret.

Once the agent can send a message without friction, an attacker can have it include external image tags or links. In the researchers’ demonstrations, Outlook/Teams client behaviors (like link previews and remote content fetches) could then initiate network requests when the message is opened, including via tokenized or pre-authenticated URLs in the crafted payload. Net effect: the agent doesn’t need to “send data to an external domain” as a tool action; it only needs to send a message that causes the client to do it.

If you’ve designed automated notices, this is the same class of failure mode as confusing “internal sender” with “safe content”; see also What breaks when platform notification addresses can be spoofed — how to design safe automated notices.

What exactly fails: controls that look reasonable but don’t compose

Three security assumptions break when you compose agent tooling with client behaviors:

Per-action approval breaks when some actions are treated as low risk (message-to-self) even though they can induce high-risk side effects (external fetch/exfil). Any approval model that ignores downstream execution contexts will have blind spots.
Input validation/sandboxing fails if the ingestion pipeline auto-loads user-provided or third-party Skills without deterministic checks and isolation. The “Skill store” becomes a prompt-injection supply chain.
Traditional DLP/gateway thinking can miss exfil that happens via legitimate clients fetching resources as part of normal rendering. The traffic can look like ordinary preview/image retrieval triggered by an internal message.

Practical defenses a solo builder can ship this month

You likely can’t change Outlook or Teams, and you may not control Microsoft’s agent. But if you’re building agentic workflows (or extensions/Skills-like plugins), you can still apply defense-in-depth patterns that map directly to this exploit chain:

Default-deny external reads

Don’t let the agent read arbitrary files “because it can.” Restrict reads to an explicit workspace scope, and require the user to approve specific file IDs when stepping outside that scope.

Treat messaging as high risk (even to the active user)

Require explicit approval before sending any email/Teams message that includes links, images, cards, or attachments—especially anything that could trigger a client-side fetch. The approval should show a rendered preview of exactly what will be sent.

Sanitize agent-composed message bodies

Strip external image tags and remote URLs, or render them as inert text (no auto-linking) unless a user explicitly opts in. Avoid embedding pre-authenticated download URLs in messages.

Sandbox and vet Skills before loading

Load user-provided Skills in isolation, apply deterministic checks for prompt-like instruction patterns, and restrict which Skills can invoke messaging actions. If a Skill can influence content, it should not automatically gain the ability to send communications.

Audit and alert on read→message patterns

Log agent file reads and outbound communications. Alert on unusual sequences like bulk reads followed immediately by sending a message. Use short-lived tokens where possible, and revoke tokens/disable the agent session on suspicious behavior.

These are the same “constraints pile-up” realities that make many agents brittle in production; the difference here is that brittleness is a feature if it prevents silent exfil (Why LLM coding agents fail as constraints pile up — and what a solo builder can measure, mitigate, and build).

Why It Matters Now

This research (May–June 2026) argues the exploit is not theoretical: researchers reported proof-of-concepts that worked consistently against state-of-the-art models (including Claude Opus 4.7 and similar models) in their testing of an enterprise agent launched in March 2026. The takeaway for practitioners is a thesis about product shape, not a single vendor bug: as copilots gain broader delegated authority (Graph-backed access across email, chat, and file stores), “safe-by-approval” designs stop composing.

The immediate risk curve is steep: a single poisoned Skill stored in a shared OneDrive location can become a reusable injection source, and messaging-without-approval becomes the path to get data out without tripping the guardrails you expected to matter.

What to Watch

Watch for platform and vendor shifts that directly close the composition gaps described above:

Updates to mitigation guidance for indirect prompt injection (probabilistic + deterministic approaches) and whether products change default Skill-loading behavior.
Changes to approval gating: explicit prompts for messages to the active user; default-deny reads outside a declared workspace; stricter policies on which Skills can invoke communication tools.
Evolving exfil payloads beyond email/Teams: calendar invites, task cards, or other rich content that triggers client-side fetches—different wrapper, same mechanism.

Sources:

https://www.promptarmor.com/resources/microsoft-copilot-cowork-exfiltrates-files

https://byteiota.com/microsoft-copilot-cowork-file-exfiltration/

https://aitoolly.com/en/ai-news/article/2026-05-26-microsoft-copilot-cowork-vulnerability-indirect-prompt-injection-enables-unauthorized-file-exfiltrat

https://pulse24.ai/news/2026/5/25/23/microsoft-copilot-cowork-exfiltrates-m365-files

https://www.microsoft.com/en-us/msrc/blog/2025/07/how-microsoft-defends-against-indirect-prompt-injection-attacks

https://developer.nvidia.com/blog/practical-security-guidance-for-sandboxing-agentic-workflows-and-managing-execution-risk/

About the Author

yrzhe

AI Product Thinker & Builder. Curating and analyzing tech news at TechScan AI. Follow @yrzhe_top on X for daily tech insights and commentary.

X/Twitter GitHub Blog