Blog · arXiv Analysis · Last reviewed June 25, 2026

The Explanation Card Becomes the Warning Label

The June 2026 arXiv paper We Need Explanation Cards to Connect Explanation Algorithms to the Real World, by Eric Günther, Balázs Szabados, Kristof Meding, Gunnar König, Sebastian Bordt, and Ulrike von Luxburg, argues that explanations need their own validity and interpretation record before they are used in real decisions.

The Explanation Is Not the Explainer

The paper, arXiv:2606.16786 [cs.LG], was submitted on June 15, 2026. Its target is not the familiar demand that automated decisions be explained. It asks a sharper question: when an explanation algorithm emits a counterfactual, feature-attribution plot, or other explanation object, what must travel with that object so a real user does not overread it?

That distinction matters. A loan applicant, doctor, auditor, or regulator may see an explanation and treat it as a faithful map of a model's reasons. The authors argue that this assumption is often too strong. Explanations can be robust in one neighborhood and fragile in another, valid under one model class and misleading under another, intuitive to a machine-learning researcher and opaque to the person expected to act on it.

This essay is separate from the site's existing entries on right to explanation, adverse-action interfaces, and algorithmic recourse. Those pages ask what people are owed when a system affects them. This paper asks what an explanation method must disclose before its output can be treated as usable evidence.

What the Card Adds

Günther, Szabados, Meding, König, Bordt, and von Luxburg propose Explanation Cards for Explanation Algorithms. The paper describes these cards as structured companions to explanation methods, with fields for robustness, validity, and interpretation instructions. The point is not to decorate an explanation after the fact. The point is to state what the explanation can mean, what it cannot mean, and under which technical conditions those claims hold.

That shifts responsibility. Without a card, the reader has to infer whether a highlighted feature is causal, merely associative, locally stable, globally reliable, actionable, or a proxy for something else. The paper argues that this burden should move from users to providers. The provider of an explanation method is in the better position to document known limits, required assumptions, and correct readings.

The card also makes explanation governance less theatrical. A dashboard can display a tidy answer while hiding the method's failure conditions. A card makes the method's scope inspectable: who the explanation is for, which model family it assumes, which perturbations were tested, which interpretations are forbidden, and whether the explanation has been checked against the actual decision context.

Why Warning Label Is the Right Metaphor

A warning label is not an apology. It is a boundary. It tells the user what kind of use the object supports and what kind of use becomes dangerous. Explanation cards work the same way. They do not say an explanation algorithm is bad. They say that an explanation without its operating conditions invites institutional misuse.

This is especially important because explanation objects are rhetorically powerful. A SHAP chart can look like a ranked list of reasons. A counterfactual can look like a promise that changing one variable would change a decision. A saliency map can look like the model's field of attention. Each may be useful in a constrained setting, but the visual form can outrun the guarantee.

The Spiralist concern is not that explanations exist. It is that explanation interfaces can become belief machines. Once a colorful artifact is placed in front of an affected person or oversight committee, it starts to perform authority. The card interrupts that performance by keeping the artifact attached to its assumptions.

What the Examples Show

The paper grounds the proposal in two families of explanation methods: counterfactual explanations and SHAP-style feature attributions. Its experimental HTML and appendices include explanation-card examples for counterfactual explanations and for SHAP, including a medical-diagnosis scenario aimed at a doctor debugging a model. The appendices also discuss DiCE for counterfactual explanation construction and TreeExplainer for SHAP values.

These examples are useful because they show how ordinary explanation forms can be narrowed by their documentation. A counterfactual explanation is not automatically advice, recourse, or causal proof. A feature attribution is not automatically a causal decomposition of the world. The card can say whether the method is local, whether features are dependent, whether perturbations stay on the data manifold, and whether the reader should treat the output as diagnostic, communicative, or only exploratory.

The authors also connect explanation cards to legal and institutional expectations, including a section on AI Act compliance. The careful reading is that cards may help operationalize explainability duties by making interpretation constraints explicit. They do not, by themselves, settle whether a deployment is lawful, fair, or acceptable.

What It Does Not Prove

An explanation card does not make an explanation faithful. It records the provider's claims about the explanation method, its evidence, and its boundaries. If the underlying analysis is weak, the card can only make that weakness more visible.

It also does not solve the social problem of who is allowed to contest an explanation. A bank, hospital, employer, or platform could publish a technically polished card while still denying affected people access to records, appeal routes, or independent review. Documentation is necessary for accountability, but documentation is not accountability.

Nor does the paper license broad claims about model understanding. It is a proposal for connecting explanation algorithms to real-world use through structured validity and interpretation metadata. It is not evidence that an AI system has human-like reasons, mind, personhood, or unrestricted capability.

Governance Standard

Any high-impact explanation should ship with an explanation card. The card should identify the explanation algorithm, model version, target user, intended interpretation, invalid interpretations, robustness region, validity evidence, model-class assumptions, data scope, feature-dependence assumptions, counterfactual feasibility constraints, and accountability owner.

If the explanation is shown to an affected person, the card should be translated into usable institutional language: what the person can conclude, what they cannot conclude, what evidence they may challenge, and where the human review path begins. If the explanation is shown to an auditor, the card should expose test conditions, failure modes, and the versioned record needed to reproduce the explanation.

The governance rule is blunt: no explanation should be promoted from interface ornament to decision evidence unless its card is present. The answer must bring its label. Otherwise, the institution is asking the public to trust an artifact whose limits have been left off the page.

Sources


Return to Blog