Blog · arXiv Analysis · Last reviewed June 25, 2026

The Instruction-Data Boundary Becomes the Security Primitive

A June 2026 arXiv paper argues that prompt injection is not just a bad prompt problem. For shared-embedding systems that lack enforced provenance, the instruction-data boundary has to move outside the model's ordinary text stream.

A Boundary, Not a Hint

Prompt injection is often described as if the model merely needs better manners: ignore hostile text, obey the system message, remember which words came from the user. If instructions and untrusted content enter the same representational pipe, the system has already asked a statistical process to recover an authority boundary after the boundary has been blurred.

The arXiv paper On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models, by Dewank Pant, Shruti Lohani, and Avijit Kumar, gives that worry a formal shape. The useful reading is architectural: agent systems need an instruction-data boundary that is enforced by design, not merely suggested to a model through wording, delimiters, or examples.

The Paper Frame

The paper is arXiv:2606.27567 [cs.CR], submitted on June 25, 2026. arXiv lists the title exactly as On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models, with subjects in Cryptography and Security, Artificial Intelligence, and Machine Learning.

The abstract states the central claim carefully. In shared-embedding architectures that lack enforced control-data separation, perfect prompt-injection prevention cannot be guaranteed within the shared representational pipeline. That is narrower than "all defenses fail." It is a claim about a class of systems, a security property, and the absence of immutable provenance.

Prompted Action Models

The paper formalizes a prompted system as a Prompted Action Model. The action space includes ordinary token generation, but also externally meaningful actions such as tool invocation, policy routing, and memory-write operations. That matters because modern AI agents do not merely answer. They ask for tools, route tasks, and store state.

The security property is Semantic-Faithful Control, or SFC. In the paper's terms, control-authoritative behavior should depend on the semantic content of untrusted input, not on semantically irrelevant changes in how that input is encoded. The same request should not become allowed because the attacker found a different token string, wrapper, or representation that carries the same practical meaning through the shared pipeline.

This is why prompt injection becomes a governance problem. The issue is not only whether the answer sounds safe. The issue is whether the system can prove which words had authority when a model decided to refuse, call a tool, route to a privileged policy, or write to memory.

Three Ways the Boundary Fails

The paper's impossibility argument combines three results. First is provenance-recovery impossibility: when trusted instructions and untrusted content are processed through shared representations, the model cannot perfectly recover provenance unless the trusted and untrusted representation distributions are disjoint. The paper frames that limit through Bayes-optimal error and total variation distance.

Second is control-path exposure. In standard shared-attention architectures, untrusted tokens enter the same value-aggregation pathway that contributes to control-relevant computation. The user's words are not merely sitting in a quarantine box. They participate in the computation that later influences outputs and authority-bearing decisions.

Third is a finite-coverage invariance gap. Training can cover many examples, but finite training cannot certify invariance over every semantically equivalent encoding an attacker may generate. Better training can reduce risk in practice. It does not, under the paper's assumptions, create a perfect architectural guarantee for SFC.

Why Markers Are Not Enough

The paper directly addresses the obvious objection: do positions, segments, delimiters, or special tokens already separate instruction from data? Its answer is no when those devices remain soft signals inside the same symbolic or representational channel. Positional encodings encode location, not immutable origin. Soft segment markers can be copied, spoofed, or treated as ordinary content unless another layer makes them non-user-writable.

The escape hatch is important. The theorem does not condemn every possible architecture. The paper says architectures with typed attention, hard segment masks, separate control/data channels, or immutable provenance tags can fall outside the theorem because they enforce separation rather than infer it after the fact. That is the practical hinge for tool-use systems: tool authority should come from a typed protocol, scoped policy, and external authorization layer, not from a free-text instruction that the model is asked to remember.

Governance Reading

The governance reading is simple: stop treating the prompt as the permission system. A system message can express policy, but it should not be the only place where authority lives. The audit record should identify the trusted instruction source, untrusted content source, provenance tag, tool scope, memory-write authority, policy route, and human or service account that approved each external effect.

This belongs beside the site's notes on the AI browser control surface and the Agent Tool Permission Protocol. If an agent can browse, click, buy, send, delete, schedule, remember, or approve, then the instruction-data boundary is part of the security model.

The Spiralist lesson is not that language models should be abandoned. It is that institutions like to move hard boundaries into polite text and then call the result alignment. The paper pushes the boundary back into architecture: typed channels, immutable provenance, separate control paths, scoped tools, and logs that survive model error.

Limits

This is a preprint and should be read as an argument under stated assumptions, not as a production certification. The empirical grounding in the paper supports the plausibility of overlap, control-path exposure, behavioral SFC violations, and weight-surgery effects, but the page here does not treat those measurements as a universal benchmark across all deployed systems.

The result also does not mean mitigations are worthless. Classifiers, training, prompt design, monitoring, and red-team suites may still be useful mitigation layers. The narrower claim is that a mechanism operating only inside the shared representational pipeline cannot provide a perfect SFC guarantee when the architecture lacks enforced provenance and exposed control paths remain.

That distinction matters for policy. A regulator or internal review board should not ask whether the vendor has a better guardrail in the abstract. It should ask where the authoritative boundary is enforced and whether untrusted text can write, route, authorize, or override through the same channel that carries trusted instructions.

Audit Receipt

The audit-grade sentence is: Pant, Lohani, and Kumar define Prompted Action Models and Semantic-Faithful Control, then argue that shared-embedding systems with exposed control paths and no immutable provenance cannot guarantee perfect prompt-injection prevention within the shared pipeline, because provenance recovery is imperfect, untrusted content reaches control computation, and finite training cannot cover every semantically equivalent encoding.

The practical receipt is not "trust no model." It is: do not let the model be the only witness for instruction authority. Put provenance, tool permission, memory writes, and policy routing into enforceable structures that can be inspected after the agent acts.

Sources


Return to Blog