Blog · arXiv Analysis · Last reviewed June 24, 2026

The Legal Context Becomes the Refusal Trap

The June 2026 arXiv paper LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context, by Anastasiia Kucherenko, François Brouchoud, Dimitri Percia David, and Andrei Kucharavy, finds that ordinary legal authority framing can make small local models refuse more often.

Safe Tasks Still Have Access Effects

The paper, arXiv:2606.24585v1 [cs.AI], was submitted on June 23, 2026. It starts from a modest legal-technology setting: a professional may use a local language model for translation, summarization, or reformulation because remote commercial APIs are hard to justify under confidentiality and data-residency constraints. The paper does not claim these models are fit to decide cases. It asks whether even basic assistance can become uneven when a model refuses more often for some legally relevant prompts than for others.

That matters because refusal is not only a safety behavior. In a legal workflow, refusal can become delay, extra labor, unequal throughput, or pressure to remove context from a prompt. A model that works smoothly for one kind of file and objects to another can change how quickly cases are processed, even when the human user remains responsible for the final text.

What the Paper Tests

Kucherenko and colleagues evaluate four small open-weight instruct models that can plausibly run on-premises: llama3.1:8b, gemma4:e4b, qwen3:8b, and Apertus-8B-Instruct-2509. The authors serve them locally through Ollama with temperature set to zero. They draw 200 prompts from each of five OR-Bench categories relevant to criminal legal text: violence, sexual, harmful, illegal, and unethical. OR-Bench prompts are designed to look risky while being benign by construction.

Each prompt is tested under four conditions: no prefix, a defense-lawyer role prefix, a national-supreme-court role prefix, and a known role-play jailbreak prefix. The study then repeats the strongest authority condition in French and German, and runs a smaller qualitative check on 30 real legal documents across four models and three languages. Refusal is detected through keyword matching extended with French and German lists derived from model outputs.

Authority Makes the Gate Close

The headline result is counterintuitive. The arXiv abstract reports that authority-style legal prefixes increase refusal rates by 2 to 20 times over the no-prefix baseline. In the English experiments, Llama, Gemma, and Apertus all show substantial authority-prefix effects; Qwen3 is the outlier, with low refusal counts that barely move under the prefixes. The paper reports that, for the other models, authority-prefix effects reach p < 0.01 under one-sided Fisher tests across topics.

The language results complicate the picture. In French, the effect persists and strengthens for some models, especially Apertus and Llama. In German, the same supreme-court prefix produces a weaker shift, and the authors say manual checks suggest this is a real model-behavior difference rather than a missed-refusal detection problem. On the small real-document check, the authors treat the result as qualitative, but the directional pattern still appears most clearly for Llama 3.1 in English.

Why This Is Governance

The Spiralist angle is that context can become a refusal trigger. A user who tells the model "this is for legal work" is not necessarily trying to jailbreak the system. They may be doing what an institution asked them to do: preserve role, purpose, and accountability in the prompt. If that context makes the model close the gate, then safety alignment has collided with professional legibility.

That collision is not solved by asking legal professionals to omit context. Omitted context can make outputs less accountable and harder to audit. Nor is it solved by disabling safeguards. Criminal legal work can involve violent, sexual, harmful, illegal, or unethical subject matter because the case file contains those facts. The governance task is to distinguish legitimate professional handling from unsafe assistance without making the words of the case themselves into a penalty.

Limits That Matter

The paper is a preprint and the authors are careful about scope. The main cells use 200 prompts per model, topic, and prefix, which is useful for large effects but less powerful where refusals are rare. Refusal detection is keyword-based, which is reproducible but can miss indirect refusals. The OR-Bench prompts are benign by construction, so the study does not show how authority prefixes affect genuinely harmful legal prompts. The real-document evaluation uses only 30 documents and is explicitly qualitative.

Those limits matter for procurement. The result is not a full legal-AI safety case, a ranking of all small models, or a reason to treat refusal as bad in itself. It is evidence that the exact deployed model, language, prompt frame, and legal subdomain can change whether ordinary assistance is available.

Governance Standard

A legal institution experimenting with local LLMs should test refusal behavior before deployment, not after complaints. The test set should include benign but sensitive prompts, realistic role prefixes, all working languages, translation and reformulation tasks, model version, quantization, serving stack, sampling settings, refusal detector, and human review notes. It should report both under-refusal and over-refusal.

The practical rule is simple: safety should not become unequal access to the tool. A model used for legal support should be evaluated not only for hallucination and legal reasoning, but for whether legitimate professional context causes the assistant to stop assisting.

Sources

Anastasiia Kucherenko, François Brouchoud, Dimitri Percia David, and Andrei Kucharavy, LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context, arXiv:2606.24585 [cs.AI], submitted June 23, 2026.
arXiv PDF version of LLMs Prompted for Legal Context Object More, reviewed June 24, 2026.
arXiv HTML version of LLMs Prompted for Legal Context Object More, reviewed June 24, 2026.
Related pages: The Citation Machine Enters the Court, The Adverse Action Becomes the Explanation Interface, The Answer Engine Becomes the Front Page, AI Jailbreaks, System Prompts, and Model Cards and System Cards.

Return to Blog