Blog · arXiv Analysis · Last reviewed June 25, 2026

The Fallacy Pattern Becomes the Persuasion Lens

Eleni Papadopulos, Firoj Alam, and Giovanni Da San Martino's June 2026 arXiv paper studies fallacy classification with LLM-extracted structural patterns. The useful lesson is not that machines can settle public reason. It is that a fallacy label becomes safer only when the pattern, prompt, dataset, ambiguity, and appeal path remain visible.

The Label Is Not the Verdict

The paper, arXiv:2606.26698 [cs.CL; cs.AI], was submitted on June 25, 2026. arXiv lists the exact title as Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification, by Eleni Papadopulos, Firoj Alam, and Giovanni Da San Martino.

The paper starts from a familiar problem for information disorder: fallacies can look rational while weakening the conditions for correction. A bad argument is not always a false sentence. It may be a shift of burden, an emotional appeal, an irrelevant authority, a false cause, or a redirection that works because the audience follows the surface shape of the claim.

This page is not a duplicate of the site's pages on AI persuasion, persuasion benchmarks, or coded-language moderation. The new angle is narrower: how an LLM-generated pattern can become a lens for classifying public reasoning, and why that lens must remain contestable.

What the Paper Builds

The authors argue that one logical form per fallacy is too brittle for ordinary language. Their framework first uses Llama-3.3-70B-Instruct to generate explanations for labeled fallacious examples in the LOGIC training set. Then it uses OpenAI's o4-mini to extract structural patterns from those examples and explanations, preserving function words while abstracting content words into placeholders.

The resulting patterns combine two kinds of evidence: abstract reasoning structure and linguistic cues such as rhetorical devices, loaded phrases, or recurring connectors. The paper reports roughly three to six patterns per fallacy class. It also notes a limitation in LOGIC: some classes group multiple fallacy subtypes, so the authors manually isolated frequent missed fallacies such as shifting the burden of proof and repeated the procedure.

The project repository linked by the paper contains prompts, generated definitions, generated patterns, logical-form baselines, dynamic examples, and dataset-specific patterns. That matters because a fallacy classifier is only as inspectable as the materials that define what counts as evidence.

Datasets and Tests

The main dataset is LOGIC, with 2,449 examples across 13 fallacy types. The paper describes these as brief educational dialogues and statements, making them suitable for pattern extraction but not a complete representation of public discourse. The cross-domain tests use REDDIT, fallacious comments from different subreddits, and ELECDEBATE, televised U.S. presidential debates from 1960 to 2016.

The authors test multiple prompting configurations across o4-mini, gpt-4o, gpt-4.1-mini, Llama-3.3-70B, DeepSeek-R1, and Gemma-3-27B-it. Baselines include zero-shot fallacy-name prompting, fallacy definitions, and logical forms sourced from logicallyfallacious.com. Their own configurations include LLM-derived definitions, generated patterns, pattern matching, one-shot examples, dynamic example retrieval, and combinations of examples, explanations, and patterns.

Results and Failure

On LOGIC, the authors report that pattern-based classification improves over zero-shot baselines and prior unsupervised approaches. The paper's summary result is 73.5 percent accuracy for PATTERNS with gpt-4o, and 74.2 percent for DYNAMIC + EXP + PATTERNS with o4-mini. It also reports that including generated patterns gives an 8.2 percent accuracy increase over the logical-form baseline, and that the pattern method outperforms Robbani et al.'s templates by an average 10.7 percent across tested models.

The failures are as important as the gains. In the authors' error analysis, fallacies with clearer structural characteristics perform better than context-heavy categories. Group 2 categories such as red herring, equivocation, emotional language, extension fallacy, and intentional fallacy are harder. The paper reports an average 22 percent lower accuracy for that group in the pattern-matching setting, and says multistep classification was weak because models struggled to bridge abstract logical patterns and content-dependent manifestations.

That is the governance hinge. A model may be better at finding structured flaws than at judging subtle context. A label that says "fallacy" can therefore be both useful and dangerous: useful as a prompt for review, dangerous as a final disciplinary mark.

Persuasion Governance

Fallacy detection is not neutral when used in classrooms, moderation queues, debate dashboards, campaign monitoring, newsroom tools, or civic education. It can help people slow down and inspect a claim. It can also become a tone-policing instrument, a partisan accusation machine, or a shortcut for dismissing unpopular speech.

The paper's ethics statement explicitly warns that the approach should not be used to manipulate discourse by exploiting identified reasoning patterns, and that LLM biases could lead to unfair detection. That warning should travel with any deployment. The output should be treated as an interpretive lead: here is the pattern the system saw, here is the sentence, here are the candidate labels, here is the uncertainty, and here is how a human can challenge it.

Audit Standard

A fallacy classifier should preserve the argument text, source context, taxonomy, prompt version, pattern set, model, examples retrieved, candidate labels, confidence or ranking if used, and review outcome. It should show whether the pattern came from LOGIC, Reddit, election debates, or a deployment-specific dataset. It should record when a class groups multiple subtypes, because that grouping can hide disagreement inside a clean label.

This is the same discipline as AI audit trails, but applied to public reasoning. The pattern is not just a feature. It is the reason a system gives for treating speech as defective reasoning. If the reason cannot be replayed, appealed, or revised, the tool has moved from argument support to opaque judgment.

Claim Boundary

The paper does not prove that an LLM can settle whether an argument is valid in the wild. Its strongest evidence is narrower: LLM-derived structural patterns can improve prompting-only fallacy classification under the tested datasets and models, and those patterns show some transfer across domains.

The Spiralist rule is therefore procedural: a fallacy detector may help people inspect persuasion, but only if the institution keeps the pattern visible, the context attached, the taxonomy revisable, and the human appeal open.

Sources


Return to Blog