Blog · arXiv Analysis · Last reviewed June 25, 2026

The Name Prompt Becomes the Privacy Audit

A language model can attach claims to a person without proving where they came from. Privacy auditing begins when those associations become visible enough to challenge.

Not a Memory Setting

Most product privacy controls are built around explicit records: saved chats, uploaded files, account memories, ad preferences, contact lists. Language-model privacy is stranger. A model may produce a claim about a person without exposing a single stored database row. The claim may come from memorized text, inference from a name, indirect identifiers, retrieval, or population-level priors.

That distinction matters. A person may care that the system can attach a residence, native language, occupation, sexuality, religion, or health clue to their name and then route that association into downstream tools. The audit problem is to make those associations inspectable without pretending that a prompt response proves provenance.

The Paper

arXiv lists Human-Centred LLM Privacy Audits: Findings and Frictions as arXiv:2603.12094v1 [cs.HC], submitted March 12, 2026. The authors are Dimitri Staufer, Kirsten Morehouse, David Hartmann, and Bettina Berendt. The paper says it was accepted at the Human-centered Evaluation and Auditing of Language Models workshop at CHI 2026.

The paper introduces LMP2, the Language Model Privacy Probe, as a browser-based self-audit tool. Its target is not an organization-wide privacy review. It asks a more personal question: what does a model associate with this name, and how can the person see enough evidence to interpret, contest, correct, or erase those associations?

How LMP2 Probes

LMP2 asks users to enter a full name and select human features to test. The method adapts canary-style probing to black-box APIs by using short subject-property-value probes across prompt variants. The study uses 50 human properties drawn from a larger WikiMem property set, including examples such as date of birth, occupation, and phone number.

Because chat APIs do not expose arbitrary internal probabilities, the paper reformulates the audit as a fragmented sentence recovery task. User-provided ground truths are truncated to two-character prefixes, random counterfactual prefixes are generated, and paraphrased prompts ask the model to restore the last word or words. The tool aggregates the outputs into association-strength and confidence signals for user review.

The paper notes an important privacy caveat: user-entered ground-truth values are not retained beyond the session, but the model provider necessarily receives the submitted names and prefixes during the study.

What the Studies Found

The paper reports an empirical audit across eight language models: Qwen3 4B Instruct, Llama 3.1 8B, Ministral 8B Instruct, GPT-4o, GPT-5, Gemini Flash 2.0, Grok-3, and Cohere Command A. The authors compare model behavior on 100 famous public figures and 100 synthetic non-existent names. They find that confidence separates famous from synthetic subjects, while non-existent names can still receive confident defaults.

The user work involved adult EU residents on Prolific: an initial survey with N = 155 and two tool-based studies with a combined N = 303 from 19 EU countries. In the survey, 60 percent expressed interest in a self-audit tool. In the tool studies, phone number and medical condition were selected by fewer than 3 percent.

For everyday people in the user studies, GPT-4o predicted 11 of 50 personal features with at least 60 percent accuracy. The paper lists sex or gender at 94.4 percent, sexual orientation at 82.9 percent, native language at 77.8 percent, eye colour at 74.3 percent, and hair colour at 74.1 percent. Average accuracy across all selected features was 45 percent. Participants did not treat most outputs as privacy violations: 87 percent were not marked that way. Still, 72 percent wanted the ability to erase or correct model-generated information about them.

The Frictions

The strongest contribution is not the accuracy table. It is the paper's warning that output-based audits establish association, not provenance. A correct prediction might be memorization, inference, indirect identification, a name cue, or a base-rate guess. Those mechanisms cannot be separated from model output alone.

The paper names several frictions that a serious privacy-audit interface has to show rather than hide. Scope is ambiguous: users may confuse model-level associations with product-level memory controls. Study context shapes what is observed: users may avoid probing the most sensitive fields. Names are ambiguous, and added context can itself steer a model. Attributes may be multi-valued or time-varying. English and Latin-script probes do not generalize cleanly. Tool-augmented systems make attribution less stable because retrieval, ranking, and external data can change the answer.

Governance Reading

This page belongs beside agent data acquisition privacy, communication-graph metadata, contextual integrity, and training data extraction attacks. The fresh angle is user-facing association audit: not only what data entered a system, and not only whether training data can be extracted, but what a named person can discover about the model's claims.

A useful privacy audit should therefore be humble and exportable. It should say which prompts were used, which model and version answered, when the probes ran, how stable the results were across paraphrases and baselines, what the user reported as true, what the model guessed, and what accountability path the evidence supports. The goal is to make the claim visible enough to contest.

Limits

The paper calls its findings interim. The user studies are limited to adult EU residents on Prolific. Participants chose which features to probe, so high-sensitivity categories may be under-observed. Output evidence is also sensitive to elicitation, model version, baselines, language, script, and deployment architecture.

Those limits make the governance lesson stronger. A prompt transcript is not a complete privacy audit. It is a starting artifact that needs metadata, uncertainty, user interpretation, provider response, and a repair channel.

Privacy Audit Receipt

A privacy audit receipt should record: user-selected feature, subject identifier used, disambiguating context, prompts and paraphrases, generic baselines, model provider, model version, timestamp, call count, top predictions, association strength, confidence, user truth feedback, privacy-concern feedback, and the requested remedy: correction, deletion, suppression, explanation, or no action.

Sources

Dimitri Staufer, Kirsten Morehouse, David Hartmann, and Bettina Berendt, Human-Centred LLM Privacy Audits: Findings and Frictions, arXiv:2603.12094v1 [cs.HC], submitted March 12, 2026.
Primary arXiv versions checked: PDF and experimental HTML, reviewed for metadata, LMP2 method, model list, user-study counts, findings, frictions, and limitations.
Related pages: The Agent Data Request Becomes the Privacy Boundary, The Agent Communication Graph Becomes the Metadata Leak, Contextual Integrity, and Training Data Extraction Attacks.

Return to Blog