The Relationship Post Becomes the Psychiatric Context Window
A June 2026 arXiv paper by Parmitha Vangapandu, Sai Ganesh Mokkapati, Sathwik Narkedimilli, MSVPJ Sathvik, Timothy Liu, Simon See, and Johannes C. Eichstaedt introduces RSPC, a psychiatrist-annotated benchmark for modeling stress and psychiatric symptom categories in digitally mediated long-distance relationships. Its useful warning is that context-aware mental-health NLP also creates context-aware privacy risk.
Fresh Angle
The paper is RSPC: A Benchmark for Modeling Stress and Psychiatric Conditions in Digitally Mediated Relationships using Psychiatrist Annotations, arXiv:2606.27247 [cs.LG], submitted June 25, 2026. It introduces the Relational Stress and Psychiatry Corpus, built from 1,799 publicly accessible Reddit posts in long-distance relationship communities.
This page is not a duplicate of the site's essays on the therapy bot waiting room, mental-health framing cues, healthcare chatbot infrastructure, or the teen companion chatbot. Those pages examine care delivery, prompt framing, support infrastructure, and intimate companion systems. RSPC is narrower: it asks how relationship context changes the measurement of psychiatric signal in online posts.
Relational Context
Most mental-health NLP benchmarks treat distress as a property of an individual text or user. RSPC shifts the frame toward interpersonal context. The source posts come from r/LongDistance and r/LDR, were collected between January 2020 and December 2023, and were filtered for narrative completeness, relational relevance, and linguistic consistency. The paper reports anonymizing usernames, personal identifiers, and proper nouns with placeholder tokens, and says the authors did not try to infer identities or contact individuals.
The dataset choice matters. A long-distance relationship post can describe sadness, worry, sleep disruption, jealousy, missed calls, time-zone mismatch, reunion planning, or commitment ambiguity in the same narrative. If a model sees only symptoms, it may flatten the case into a generic distress label. If it sees relational triggers, it can also learn a more dangerous capability: making psychiatric inferences from the ordinary logistics of intimacy.
Annotation Layers
The annotation framework is aligned with DSM-5-TR and ICD-11 categories, but the paper is careful to frame the labels as symptom-oriented annotations rather than formal clinical diagnoses. The five psychiatric categories are Major Depressive Disorder, Generalized Anxiety Disorder, Separation Anxiety Disorder, Adjustment Disorder, and Insomnia. The other tiers annotate relational stressor triggers and temporal relationship phase.
The paper reports that the annotation process used four trained annotators and, in the appendix, identifies the annotators as licensed psychiatrists with training in clinical psychology and mental-health research. Each post was independently annotated, disagreements were adjudicated, and inter-annotator agreement was measured. Reported Cohen's kappa values were 0.78 for psychiatric symptoms, 0.72 for relational stressors, and 0.81 for temporal relationship phases. The final dataset uses a 70:10:20 stratified train, validation, and test split.
The label distribution is uneven. Adjustment Disorder appears in 74.5 percent of posts and Generalized Anxiety Disorder in 71.1 percent, while Major Depressive Disorder is 17.1 percent and Insomnia is 1.2 percent. Commitment Ambiguity and Lack of Communication are the most common relational stressors, and the Separation phase dominates temporal labels. This is not a defect to hide. It is the shape of the benchmark's world.
Model Results
The experiments compare seven fine-tuned transformer architectures with five prompted large language models. The transformer group includes BERT-base, RoBERTa-base, ClinicalBERT, BART-base, T5-base, Longformer, and BigBird-RoBERTa. The LLM group includes GPT-4o, Claude-3-Haiku, Qwen-2.5-72B, LLaMA-3-70B, and Nemotron-Super.
The headline results are task-specific. Claude-3-Haiku has the best reported disorder-classification Macro-F1 at 0.538. GPT-4o has the strongest relational-trigger detection Macro-F1 at 0.519. Temporal phase classification is harder: the paper says raw accuracy is inflated by the dominant Separation class, while Macro-F1 exposes majority-class bias. The useful finding is not that one model "solves" mental-health inference. It is that different model families fail differently depending on whether the task is psychiatric category detection, relationship-trigger interpretation, or temporal-state inference.
Limits and Ethics
The paper's ethical section is unusually important to the technical claim. It says the dataset is limited to English-language Reddit communities and may not generalize to other cultures, demographics, social platforms, or offline populations. It also says the authors explicitly discourage use of RSPC-trained systems for psychiatric diagnosis, surveillance, employment screening, insurance screening, or other high-stakes decision-making without qualified human oversight.
The appendix describes controlled-access release procedures to reduce misuse risk and says derivative high-stakes decision systems will not be permitted under the release agreement. That point should travel with any citation of the benchmark. A corpus about vulnerable relationship distress is not just a data asset. It is a collection of people narrating uncertainty, loneliness, fear, and conflict in public spaces that were not designed as clinical intake systems.
Governance Standard
For Spiralism, the governance rule is a relational-inference receipt. Any model trained or evaluated on this kind of corpus should preserve the source community, collection window, anonymization method, label taxonomy, annotator qualifications, adjudication process, class imbalance, intended-use boundary, prohibited-use list, release conditions, and review route for misuse reports.
The receipt should also separate three claims that are easy to merge. First, a post may contain language consistent with a symptom category. Second, a relationship stressor may be associated with that language. Third, a deployed system may or may not be allowed to act on that association. RSPC helps study the first two. It does not grant authority for the third.
The deeper lesson is that relationship context is double-edged. It can make models less naive about social distress, but it can also make surveillance more intimate. If a system can infer anxiety from silence gaps, commitment ambiguity, time-zone strain, and reunion cycles, then the audit question is not only whether the classifier is accurate. It is who is allowed to ask.
Sources
- Parmitha Vangapandu, Sai Ganesh Mokkapati, Sathwik Narkedimilli, MSVPJ Sathvik, Timothy Liu, Simon See, and Johannes C. Eichstaedt, RSPC: A Benchmark for Modeling Stress and Psychiatric Conditions in Digitally Mediated Relationships using Psychiatrist Annotations, arXiv:2606.27247 [cs.LG], submitted June 25, 2026.
- arXiv HTML: RSPC: A Benchmark for Modeling Stress and Psychiatric Conditions in Digitally Mediated Relationships using Psychiatrist Annotations, reviewed for dataset construction, annotation tiers, benchmark tasks, model comparisons, results, ethics, and release limits.
- arXiv PDF: RSPC: A Benchmark for Modeling Stress and Psychiatric Conditions in Digitally Mediated Relationships using Psychiatrist Annotations, checked against the arXiv record for title, authors, arXiv ID, submission date, category, and paper status.
- Related pages: The Therapy Bot Becomes the Waiting Room, The Framing Cue Becomes the Mental-Health Instability, The Healthcare Chatbot Becomes Support Infrastructure, The Companion Chatbot Becomes the Teen Confidant, and AI in Healthcare.