Blog · arXiv Analysis · Published: June 25, 2026

The Unlearning Claim Becomes the Localization Test

A model can stop saying a fact without proving that the fact stopped living in the weights. LACUNA turns that gap into a testable object.

The Paper

The paper is LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning, arXiv:2607.02513 [cs.CL, cs.AI, cs.LG]. The arXiv record lists Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, and Verna Dankers as authors, with version 1 submitted on July 2, 2026. The PDF is a 27-page preprint under review, with Mila and McGill University affiliations on the title page.

The problem is not whether a model can be made to stop outputting a targeted answer in a narrow test. The problem is whether the system can tell the difference between removal and suppression. In privacy language, that distinction matters because a user, regulator, or data steward may hear "unlearned" as a durable deletion claim. The paper asks for stronger evidence.

Output Forgetting Is Not Weight Evidence

Existing unlearning benchmarks often check behavior: does the model avoid the forbidden answer, preserve retained knowledge, generalize to paraphrases, and keep overall utility? Those are necessary tests. LACUNA argues that they are not sufficient, because a model may still carry the target information in parameters that did not get precisely changed. The knowledge can later resurface through fine-tuning, relearning, or other attacks.

The paper names the missing measurement localization precision: whether an unlearning method changes the weights that actually store the targeted information while leaving other weights mostly alone. The difficulty is that ordinary pretrained models do not come with a ground-truth map of where a particular memorized detail lives. If an evaluator first uses an attribution method to guess the location and then scores unlearning against that guess, the test becomes circular.

What LACUNA Builds

LACUNA creates a controlled setting. The authors take synthetic personally identifiable information from PANORAMA, select 1,200 synthetic profiles, and focus on email address, birth city, phone number, and driver's license fields. They mix those records with a 4.3-billion-token subset of the OLMo-2 pretraining corpus, plus roughly 2 billion tokens of question-answer data derived from the profiles.

The key move is masked continual pretraining. The 1,200 profiles are split into six groups, each assigned a non-overlapping binary mask covering 5 percent of model parameters between layers 0 and N - 2. When a training sample contains PANORAMA or QA data, only the designated weights for that group receive updates; ordinary pretraining samples update all weights. The masks target feedforward and attention parameters, not normalization layers or embeddings. The paper applies this to OLMo2 1B and OLMo3 7B models, then performs LoRA instruction tuning on the last two layers so the models can answer QA-style PII extraction prompts.

The release includes trained 1B and 7B models with memorized PII, ground-truth masks, forget and retain sets, and evaluation tooling. The GitHub evaluation release describes behavioral metrics, localization precision measured by ROC-AUC of weight changes against the mask, and resurfacing robustness after relearning on held-out PII.

What the Results Show

The paper evaluates SimNPO, AlphaEdit, MemFlex, and a privileged baseline named OracleGrad. The first three represent current gradient-based and localization-based unlearning approaches. OracleGrad is not a deployable discovery method; it receives the ground-truth forget mask and restricts gradient-difference updates to those weights, so it tests what happens if localization is already right.

For OLMo2 1B on email addresses, Figure 4 reports localization AUC values of 0.500 for AlphaEdit, 0.500 for MemFlex, 0.515 for SimNPO, and 0.915 for OracleGrad. The paper interprets the first three as highly imprecise despite useful output-level behavior. It then stress-tests resurfacing by fine-tuning unlearned models on held-out PII and probing 100 forget-set profiles with 200 attempts. In the email-address setting, AlphaEdit and MemFlex leak large portions of the forget set, SimNPO is more robust but still leaks some profiles, and OracleGrad leaks the least.

The lesson is uncomfortable but useful. A method can look good on a behavioral unlearning scoreboard while failing the parameter-location test. Conversely, when the paper gives a simple gradient method an accurate mask, the method becomes much harder to reverse. The practical bottleneck is not only how to edit. It is how to know where to edit.

The Deletion Receipt

A serious model-deletion claim should carry more than a statement that the output changed. It should name the data item or class, the source corpus, the model version, the unlearning method, the retain set, the utility tests, the localization evidence if available, the resurfacing test, and the residual-risk decision. If the team cannot localize the influence, it should say so rather than translating a behavioral refusal into an erasure promise.

That receipt does not make machine unlearning simple. It prevents the interface from laundering uncertainty. A user asking whether a record was removed deserves to know whether the system deleted a source row, blocked retrieval, suppressed an answer, edited a model, retrained from a scrubbed corpus, or merely passed an output-level test.

Limits

LACUNA is deliberately synthetic. It uses synthetic PII and controlled masked training, not messy production training histories. Its ground truth is constructed rather than discovered after the fact. The authors also acknowledge that knowledge localization may not always be realistic and that memory in dense models may not always be cleanly localized.

Those limits are the point of the governance lesson. In real systems, deletion evidence is usually weaker than the promise printed on the settings page. LACUNA does not solve unlearning. It makes the missing proof visible.

Sources

Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, and Verna Dankers, LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning, arXiv:2607.02513 [cs.CL, cs.AI, cs.LG].
arXiv HTML for LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning, checked for title, abstract, method, figures, experiment setup, results, and limitations.
arXiv PDF for LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning, checked for title page, author metadata, training setup, benchmark metrics, result figures, resurfacing tests, conclusion, and acknowledgments.
McGill-NLP/LACUNA GitHub repository, checked for the evaluation-release scope, model/data artifact description, setup notes, evaluation commands, localization precision metric, resurfacing test, and citation metadata.

Return to Blog