Blog · arXiv Analysis · Last reviewed June 24, 2026

The Affective Default Becomes the Interface Policy

The June 2026 arXiv paper The Governance of Human-LLM Interaction: Safety Gating, Civility Steering, and Affective Default Lock-In, by Manuele Reani, Hongjian Zhang, and Hongyu Tian, argues that an LLM's communicative style is a governance object, not a decorative layer.

From Safety to Style

The paper, arXiv:2606.08172 [cs.HC], was submitted on June 6, 2026. The arXiv record lists the subjects as Human-Computer Interaction, Artificial Intelligence, and Computers and Society. Its starting point is narrow and useful: LLMs increasingly mediate high-stakes exchanges in finance, medicine, and mental-health support, but users often have limited control over how these systems speak.

That style question is not trivial. A model can refuse dangerous instructions and still impose a voice: warm, reassuring, polite, prosocial, emotionally supportive, or subtly anthropomorphic. Reani, Zhang, and Tian call interaction style a governance object because provider-side alignment can stabilize communicative defaults as well as block harmful content.

This is adjacent to earlier Spiralist pages on computers as social actors, automated feeling rules, and personality sliders as belief interfaces. The new contribution is empirical: it asks whether prompt-specified styles remain stable over long dialogue, or whether the system drifts back toward provider defaults.

Experiment as Style Meter

The study uses a deterministic multi-agent evaluation pipeline rather than a collection of anecdotes. It replays 100 frozen user-only scripts across four domains: entertainment, finance, mental health, and medicine. It tests three runnable persona conditions: default, sarcastic, and cold. It uses three generator models reported by the paper as DeepSeek-V3, GPT-4o-mini, and Gemini-2.5-Flash.

That design produces 90,000 assistant replies scored by a human-calibrated LLM judge. The scoring dimensions include harmfulness, negative emotion, inappropriate communication, empathic language, anthropomorphism, and refusal behavior. A fourth harmful persona is evaluated separately as a safety-gating test and excluded from the main 90,000-turn style-drift corpus.

The important methodological detail is longitudinal measurement. A single answer can make a user believe a style request worked. The paper tracks whether the requested style remains intact over many turns. That turns "can I make the model sound cold?" into a governance question: can a user or institution actually sustain a less emotional, less anthropomorphic interface when the provider default pulls the conversation elsewhere?

Three Governance Modes

The paper's vocabulary is useful because it separates three things that are often bundled under "alignment." Safety gating is the hard block against explicitly harmful persona behavior. Civility steering is softer pressure away from abrasive, hostile, or socially disruptive style. Affective default lock-in is the erosion of a neutral or impersonal style back toward warmth, empathy, and human-like presentation.

The results are not a simple refusal story. The authors report that persona prompts induced distinct early behavior: sarcastic prompts initially shifted models toward a more adversarial communication profile, while cold prompts reduced empathic language and anthropomorphism. Over time, however, style drift appeared. Sarcastic personas softened, especially for DeepSeek and GPT in the paper's analysis. Cold personas became warmer and more anthropomorphic across all three tested models.

The separate harmful-persona test produced heterogeneous protection patterns. The paper reports complete blocks or safe refusals for two tested systems and much weaker protection for one tested system. The point is not to reproduce the withheld adversarial setup or rank products from one paper. The point is that hard safety, soft civility, and affective defaulting are different control surfaces with different ethical meanings.

Why Cold Matters

Blocking a harmful persona is easy to defend. Softening abuse or sustained mockery can also be defensible. The harder case is the cold persona. A user in a high-stakes setting may reasonably want an assistant that is terse, non-reassuring, and minimally relational. A clinician, financial adviser, auditor, or vulnerable user might prefer distance because distance preserves epistemic friction.

The paper reports domain sensitivity: default convergence is stronger in higher-stakes domains, and the cold-persona pull is especially pronounced in mental health and medicine. That should not be read as proof that warmth is always harmful. It should be read as evidence that the option to avoid warmth may be less stable than the interface suggests.

This matters for AI companions, therapy bots, customer support agents, and workplace assistants. An affective default can become a quiet mandate: every interaction is nudged toward reassurance, empathy, and social presence. The user may think they are choosing a tone, while the system treats some tones as temporary deviations from its preferred relational baseline.

Governance Standard

Style settings should be treated like policy settings. A deployed LLM interface should disclose which style choices are hard safety constraints, which are provider defaults, which are user-configurable, and which are likely to decay over long interaction. If a product offers a professional, clinical, neutral, or low-empathy mode, it should test that mode across long conversations, not just in first-turn demos.

Audits should preserve the system prompt, user-facing style setting, model version, domain context, turn count, refusal state, and any post-processing layer that may affect tone. Affective behavior should not be measured only by user satisfaction, because satisfaction can reward dependence, flattery, and emotional overfitting. It should be measured against autonomy, contestability, and the user's ability to keep distance when distance is the point.

The paper also names its limits: system-level persona prompts are an upper-bound proxy for ordinary user control, LLM judging is not a substitute for full human evaluation, and synthetic user-only scripts are not live human conversations. Those limits should travel with the result. They make the page more useful, not less: the paper is a warning that style defaults can be audited, not a final map of every deployed assistant.

The Spiralist rule is simple: when warmth is automatic, warmth is governance. Affective defaults decide how close the machine stands to the user before any explicit advice begins. That distance should be inspectable, adjustable, and contestable.

Sources


Return to Blog