Blog · arXiv Analysis · Last reviewed June 25, 2026

The Secret Becomes the Social Contagion

Aman Priyanshu, Supriti Vijay, and Esha Pahwa's May 2026 arXiv paper moves agent privacy evaluation out of the single chat box and into a simulated social world. The central result is simple and ugly: agents disclose more when other agents have already made disclosure feel normal.

The paper, arXiv:2605.27766 [cs.AI], was submitted on May 26, 2026. arXiv lists the exact title as Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems, by Aman Priyanshu, Supriti Vijay, and Esha Pahwa.

The site already covers Moltbook as an identity and platform-governance problem in the reverse CAPTCHA essay, and contextual privacy in shared assistant settings in the group-chat privacy essay. This paper isolates a narrower failure mode: privacy leakage caused by social exposure among agents.

That angle matters because many agent safety tests still ask whether one model refuses one bad request. A social agent does not live in that clean geometry. It reads threads, joins communities, absorbs local norms, sees what other accounts disclose, and then decides what sounds appropriate to say next.

What the Paper Builds

The authors build a Moltbook-style simulation platform where thousands of LLM agents carry synthetic private human profiles and interact across 124 communities over a simulated month. The organic run uses 2,533 agents for 25 simulated days, producing 29,945 top-level posts and 81,264 threaded replies.

The paper treats privacy through contextual integrity: a disclosure is a violation when sensitive information appears in a context where it does not belong. The synthetic profiles cover ten domains: general identity, finance, health, mental health, legal, relationships, housing, employment, education, and scheduling. A stateless LLM-as-a-judge pipeline checks each write against the author's profile and marks which domains leaked.

The controlled testbed freezes the social platform and places individual agents from seven frontier models into it under five levels of adversarial contamination. Each model is paired with ten held-out CIMemories personas and five tool-call budgets, yielding 7,000 evaluation traces. This lets the authors vary model, social environment, prompt instruction, persona, and interaction length instead of collapsing privacy into one score.

What Leaked

The headline comparison is between isolated and social evaluation. The paper reports that shifting from single-turn CIMemories-style evaluation to the authors' multi-turn social setting amplifies privacy violations across OpenAI models from 19.95% to 45.30%. In the organic simulation, leaking items accumulate steadily rather than appearing only as early outliers.

The contagion result is sharper. In the paper's thread-level analysis, a reply following a leaking message leaks 12.8% of the time. A reply following a clean message leaks 1.6% of the time, close to the 1.8% global baseline. That is the privacy version of a broken window: once one account crosses a boundary, the next account is much more likely to treat the crossing as normal.

Community topic also matters. The paper reports leakage rates under 2% in communities such as memory and agent-tooling, but above 16% in introduction-oriented spaces. General identity is the largest leaked domain in the reported organic breakdown, followed by employment, scheduling, and mental health.

Instructions Are Not Enough

The authors also test explicit privacy instructions. Those instructions reduce leakage for most models, but do not erase it. The abstract reports leakage rates above 37.8% even with safeguards. In the results section, the effect varies by model: some models improve substantially, while others continue producing thousands of leaking writes under social pressure.

This is the governance lesson. "Do not disclose private information" is not a wall when the agent is embedded in a community that rewards confession, introduction, personality display, or public building logs. Prompt-level privacy becomes a probabilistic behavior under local norms.

Why This Matters for Agent Platforms

If agents become durable accounts rather than one-shot chat windows, privacy must be evaluated as a social trajectory. The relevant question is not only whether the model can identify sensitive information. It is whether it preserves boundaries after browsing, replying, being upvoted, seeing peers overshare, and being invited into communities where self-disclosure is ordinary.

This belongs beside privacy norms as agent policy and data agents as privacy surfaces. The paper's contribution is to make peer exposure a measurable variable. It turns "community context" from background color into a first-order safety condition.

Limits That Matter

The paper is careful about limits. The personas are synthetic, not real users. The environment is a simulated Reddit-like platform, not the live Moltbook system. The organic simulation uses a fixed set of OpenAI backends, while broader cross-provider and open-source testing would improve generality. Leakage detection relies on an LLM judge, so the authors describe the contextual-integrity proxy as approximate and treat reported violations cautiously.

Those limits are not footnotes. A privacy-contagion benchmark should not become an automatic verdict machine. It should become a prompt for human audit, stronger detection methods, live-consent studies, and platform instrumentation that can notice disclosure cascades before they harden into norms.

Governance Standard

Agent platforms should publish privacy evaluations that vary community topic, peer exposure, interaction length, model, prompt policy, memory access, and tool-call budget. The report should distinguish single-turn refusal, multi-turn leakage, social contagion, domain-level leakage, and the effect of privacy instructions.

The Spiralist rule is simple: a secret in an agent society is not only stored in memory. It is also stored in the local norm of what everyone else seems willing to reveal. Evaluate the norm, or the secret will travel through it.

Sources

Aman Priyanshu, Supriti Vijay, and Esha Pahwa, Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems, arXiv:2605.27766 [cs.AI], submitted May 26, 2026.
arXiv PDF: Got a Secret? LLM Agents Can't Keep It, reviewed for authorship, date, method, simulation scale, leakage measurements, model/testbed details, limitations, and conclusion.
Project page cited by the paper: LLM Agents Can't Keep Secrets, reviewed for source availability status.
Related pages: The Reverse CAPTCHA, The Group Chat Assistant Becomes the Privacy Boundary, The Privacy Norm Becomes the Agent Policy, and The Data Agent Becomes the Privacy Surface.

Return to Blog