Blog · arXiv Analysis · June 25, 2026

The Logit Contribution Becomes the Retrieval Witness

Aryo Pradipta Gema, Beatrice Alex, and Pasquale Minervini's July 2026 arXiv paper proposes Logit-Contribution Scoring, or LOCOS, for finding attention heads involved in non-literal long-context retrieval.

For this essay, a retrieval-witness receipt names the prompt, source span, answer token, model, head score, OV projection rule, spatial contrast, ablation depth, benchmark, and downstream check behind a claim that a model used context rather than surface token copying.

The Claim

The paper, arXiv:2607.01002 [cs.CL], was submitted on July 1, 2026 under the title Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads. The authors describe long-context retrieval as more than finding a matching string. In many useful cases, a model must read a relevant passage, interpret its meaning, and write an answer token that may not appear literally in the passage.

LOCOS is their proposed detector for that regime. Instead of asking whether a head attends to a source token that matches the generated answer, it scores the head by whether its output-value path writes in the direction of the correct answer token in unembedding space. The method contrasts contributions from the needle span against off-needle positions inside the same forward pass.

Read Versus Write

The governance intuition is clean: attention is not the whole receipt. A head's query-key behavior tells where it reads. Its output-value circuit helps determine what it writes into the residual stream. Two heads can look similar as readers and different as writers.

That distinction matters when a model answers from meaning. A literal detector can notice a head that attends to the token it later emits. But a non-literal answer may require attending to a phrase that implies the answer rather than contains it. In that setting, the audit question is not only "where did the model look?" It is also "what answer-relevant signal did this component write after looking there?"

Why Non-Literal Retrieval Matters

Long-context products are sold as systems that can use files, chats, logs, tables, and institutional memory. Their highest-risk answers often depend on semantic retrieval: policy interpretation, case triage, source arbitration, contract summarization, and multi-hop work where the answer is assembled rather than copied.

If evaluators rely only on literal-copy probes, they can overestimate what has been understood or underestimate which internal components carried useful context. A model can pass a source-citation check while relying on parametric habit, nearby distractors, or a shallow string bridge. Conversely, a head can matter even if its most attended source token is not the answer token.

The Causal Check

The authors test LOCOS across Qwen3, Gemma-3, and OLMo-3.1 model families. On the NoLiMa non-literal retrieval benchmark, they report that mean-ablating the top LOCOS heads degrades ROUGE-L faster than prior attention-based detectors. Their arXiv abstract gives a concrete Qwen3-8B result: ablating 50 selected heads moves ROUGE-L from 0.401 to 0.000, while the strongest baseline still retains 0.292.

The paper also reports specificity checks. The selected heads are described as retrieval-specific because parametric recall and arithmetic reasoning remain at baseline under the same ablation. On Qwen3-8B, the same ablation is reported to drop MuSiQue accuracy from 0.55 to 0.08 and BABILong from 0.62 to 0.20, while a random-head control stays within 0.05 of baseline.

Governance Reading

This is not a release-safety certificate. It is a better instrument for a narrower claim: when a long-context model appears to answer from a source span, which internal heads contributed source-local, answer-aligned signal?

That narrower claim still matters. It separates a model that points at relevant text from one whose selected components causally support the answer. It also shows why transparency claims based on attention maps alone are weak. An attention heatmap can be a reading receipt without being a writing receipt.

For institutions, retrieval evidence should be layered: retrieved passages, citations, token-level attribution, component-level probes, ablation evidence, and downstream benchmark behavior. No one layer should be treated as the whole explanation.

Retrieval-Witness Receipts

A retrieval-witness receipt should name the evaluated model checkpoint, prompt template, context length, inserted source span, gold answer, generated answer, scoring method, head list, ablation mode, calibration sample, benchmark split, and code or artifact revision. It should distinguish literal retrieval, where answer tokens appear in the source, from non-literal retrieval, where the answer must be inferred from meaning.

The receipt should also record negative evidence: heads that looked at the source but did not write answer-aligned signal, heads that wrote from off-needle positions, control heads, and downstream tasks that did not degrade. Those details prevent a detector from becoming a decorative plausibility story.

Limits

The page treats the paper as an interpretability and evaluation contribution, not as proof that a particular model understands a document in a human sense. The results are bounded by selected benchmarks, model families, ablation implementation, correctness filters, and direct-path assumptions in projecting head outputs toward answer-token logits.

LOCOS also does not solve source truth. A head can help retrieve from a false passage, and a model can assemble a correct-looking answer from poisoned context. A retrieval witness is only one part of a larger evidence chain that still needs provenance, source quality, conflict handling, and human review in high-stakes settings.

Source Discipline

This page uses arXiv's abstract and HTML paper as the primary sources for title, authorship, submission date, method description, model families, benchmarks, ablation results, and limitations. The accompanying public repository and Hugging Face results page were checked as artifact pointers, but this page did not rerun the experiments or independently validate the reported numbers.

Mechanistic Interpretability, AI Evaluations, Retrieval-Augmented Generation, and Activation Steering cover the surrounding evidence and intervention frame.
The Retrieved Memory Becomes the Sycophancy Cue, The Table Reference Becomes the Reasoning Error, The Knowledge Conflict Becomes the Source Arbitration Trace, and The Prompt Cache Becomes the Agent Budget give adjacent long-context receipt patterns.

Sources

Aryo Pradipta Gema, Beatrice Alex, and Pasquale Minervini, Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads, arXiv:2607.01002 [cs.CL], submitted July 1, 2026.
arXiv HTML: arXiv:2607.01002 HTML, reviewed for method, experiment setup, evaluation sections, related work, limitations, artifact notes, and appendix framing.
Paper PDF: arXiv:2607.01002 PDF.
LOCOS artifact pointers: aryopg/locos GitHub repository and aryopg/locos-results dataset page.

Return to Blog