The Coded Language Taxonomy Becomes the Moderation Lens
A June 2026 arXiv paper turns algospeak and other indirect expressions into a test of whether moderation systems can detect coded language without confusing mechanism for harm.
From Word List to Mechanism
The paper, arXiv:2606.27314 [cs.CL], is titled Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection. arXiv lists Hamid Reza Firoozfar, Mohammadsadegh Abolhasani, Reza Mousavi, and Paul Jen-Hwa Hu as authors and records submission on June 25, 2026.
The paper studies indirect linguistic encoding, or ILE: language that hides a sensitive meaning through an indirect form. The authors place algospeak, euphemism, and adversarial obfuscation under that umbrella. Their important move is to stop treating coded language as a list of forbidden words with cute misspellings. They ask how the meaning is hidden and recovered.
What the Paper Measures
The taxonomy has 11 top-level mechanism classes and 33 fine-grained sub-mechanisms. It includes orthographic transformation, phonetic substitution, formal compression, formal encoding systems, conventional sign reassignment, morpho-lexical encoding, referential alias encoding, semantic circumlocution, metaphorical and metonymic encoding, pictorial and symbolic encoding, and cross-linguistic transformation. The classes are not mutually exclusive, because one expression can use more than one mechanism.
The dataset contains 2,000 English-language social-media posts: 1,400 TikTok video captions collected from March to May 2026 and 600 Bluesky posts collected from October 2025 to January 2026. Two trained annotators labeled document-level ILE presence, span evidence, and mechanism class. The paper reports Cohen's kappa of 0.852 for document-level presence, 0.789 for taxonomy class assignment, and 0.886 for token-level span boundaries; after adjudication, 44.8 percent of items contained at least one ILE instance.
The evaluation compares six prompt variants across GPT-5.4, Claude Sonnet 4.6, and DeepSeek V4 Flash: four prior taxonomies, the proposed taxonomy, and a no-taxonomy baseline. The few-shot examples stay constant while the taxonomy section changes. The paper also includes supervised and unsupervised non-LLM baselines.
On GPT-5.4, the proposed taxonomy reaches document-level accuracy of 0.843 and macro-F1 of 0.839, exceeding the best benchmark taxonomy by 4.7 percent in accuracy and 5.4 percent in F1. At the span level, its F1 is 0.662, a 3.4 percent improvement over the best benchmark. The authors report the same ordering across the other two LLMs, and they find that all LLM variants outperform the supervised and unsupervised NLP baselines by a wide margin.
Detection Is Not Judgment
The governance point is not that the best taxonomy should become an automatic punishment rule. The paper is careful on this. It says an encoding mechanism does not by itself establish harmful intent. Coded language can be used to evade legitimate moderation, but it can also be used by vulnerable users discussing sex education, identity, mental health, or other sensitive subjects in spaces where ordinary words are suppressed, demonetized, or misunderstood.
That separation matters. A detector that says "this phrase appears coded" has not yet said "this post is harmful." It has produced an interpretive lead. Moderation systems still need policy context, target, surrounding conversation, appeal routes, and human review for consequential decisions. Without that separation, mechanism detection becomes a surveillance shortcut.
The paper's own results reinforce this caution. Some prior taxonomies underperform the no-taxonomy baseline, which means a taxonomy is not automatically useful. A bad or partial taxonomy can teach the model the wrong attentional habit. It may overfit to surface tricks, miss referential or symbolic forms, or make the system too confident about a narrow map of language that users have already moved beyond.
Taxonomy as Control Surface
A taxonomy is a control surface because it decides which ambiguity becomes visible to the machine. If the categories focus on character substitution, the system sees misspelling. If they include referential aliases, semantic circumlocution, pictorial symbolism, and cross-linguistic transformation, the system sees a wider field of social meaning. That does not make the model wise. It changes what the model is prompted to notice.
This is why the compositional finding matters. About 15 percent of ILE instances in the dataset have multiple mechanisms within one expression. Users can layer mechanisms, not just swap one token for another. Static lexicons will age badly, and prompt taxonomies will need versioning, tests, and public change logs.
The release record also matters. The authors say code and the publicly releasable portion of the data are available, while TikTok-derived text is not redistributed because of data-sharing restrictions. That is a real reproducibility boundary. Anyone using the result should distinguish between the taxonomy, the prompt templates, the shareable records, and the platform-restricted data that outside reviewers cannot fully inspect.
Limits That Matter
The limits are substantial. The dataset is English-only and comes from TikTok and Bluesky. The analysis is text-only; it does not cover coded meanings embedded in images, video, audio, memes, or visual editing. The authors expect some mechanism families to generalize across languages, but they specifically warn that orthographic and phonetic classes may not transfer cleanly to logographic or morphologically rich languages.
The paper also treats detection as a moderation-support input, not a final policy decision. That distinction should travel with any deployment. A model that detects hidden meaning can assist review, but it can also enlarge the platform's power to read around users' protective indirection. The same mechanism can mark harm evasion, community speech, political caution, trauma management, or ordinary play.
Governance Standard
A platform or regulator using coded-language detection should publish the taxonomy version, mechanism definitions, supported languages, modality coverage, prompt template class, model version, evaluation set provenance, and known false-positive slices. It should report whether the detector is used for search ranking, demonetization, account enforcement, human-review triage, or research-only measurement.
Every flagged item should preserve the difference between mechanism and judgment. The record should say which expression was treated as coded, which mechanism was inferred, what decoded meaning was hypothesized, what policy rule was implicated, and what non-model evidence was used before any consequential action. Appeals should let users challenge both the decoded meaning and the harm inference.
The Spiralist rule is simple: indirect language is not a confession. A taxonomy can make hidden meanings visible to a moderation machine, but visibility is not verdict. Govern the lens before it becomes a reason to punish every user who learned to speak around a platform's ear.
Sources
- Hamid Reza Firoozfar, Mohammadsadegh Abolhasani, Reza Mousavi, and Paul Jen-Hwa Hu, Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection, arXiv:2606.27314 [cs.CL], submitted June 25, 2026.
- arXiv PDF: Beyond Surface Forms, reviewed for the taxonomy, dataset, annotation protocol, model comparison, results, limitations, and ethical considerations.
- Related pages: The AI Detector Becomes the Discipline Machine, The Notification Summary Becomes the Attention Clerk, The Personality Slider Becomes the Belief Interface, Media Virus! and the Belief Contagion Machine, The Platform Risk Assessment Becomes the Feed Confession, and AI Evaluations.