Blog · arXiv Analysis · Published: June 25, 2026

The Assumption Register Becomes the Board Record

Jeroen Janssen's strategic-red-team report moves AI governance upstream from model behavior into the evidence quality of board-level assumptions.

The Paper

The paper is From Battlefield to Boardroom: Strategic Red Teaming as an Epistemic Governance Instrument in the Age of AI, arXiv:2607.01913 [cs.CY]. The arXiv record lists Jeroen Janssen as author and v1 as submitted on July 2, 2026; the paper title page gives Apparens as affiliation. The document is a technical-report edition of an Apparens working paper.

Its topic is not jailbreak testing, penetration testing, or model-output evaluation. It asks what should happen before a board approves a material AI strategy. The report's answer is that the load-bearing assumptions behind the strategy should be independently stress-tested, graded by evidence quality, and carried into the board record.

The Object of Test

Most AI red-team language points toward the system: can the model be induced to produce harmful output, can a tool boundary be crossed, can a workflow be misused? Janssen's report shifts the target. The object under test is the strategic decision itself: the propositions that must be true for the initiative to be defensible.

That shift matters because AI adoption often turns a business claim into an operating dependency. A board may hear that a vendor model will reduce cost, improve accuracy, accelerate service, satisfy regulatory obligations, preserve accountability, or keep the organization competitive. Each of those claims can be coherent in a slide deck and still be weak as evidence. A strategic red team asks what would have to fail for the strategy to become a liability, and whether the organization has looked.

The Model

The report describes six components. Mission-alignment testing asks whether the AI capability serves the organization's mission or whether the mission has been bent around available automation. Assumption mapping identifies the propositions that make the plan work and separates documented, inferred, asserted, and contradicted claims. Dependency stress testing examines vendor, architecture, and operational critical paths. Economic-fragility testing pressures forecasts, cost assumptions, adoption claims, switching costs, and capital commitments. Regulatory-exposure simulation tests whether obligations are known, owned, evidenced, and timed. Accountability-boundary testing asks who can explain, intervene, override, approve, and remediate when harm appears.

The model also names five exposure dimensions that make ordinary approval thinner than it looks: operational leverage, transparency reduction, dependency concentration, regulatory liability, and accountability-boundary shift. None of these proves that a strategy should be rejected. Together, they mark the places where a familiar risk register may be too late, because the disputed question is not merely delivery risk. It is whether approval was evidence-grade in the first place.

Evidence Grades

The paper's strongest governance move is its evidence scale. Strategic assumptions can be documented, tested, inferred, asserted, contradicted, or unknown. This is not a probability model. It is a discipline for preventing an unsupported claim from looking like a settled decision merely because no one opposed it in the meeting.

A documented assumption has current, traceable, decision-relevant evidence. A tested assumption has operational tests, pilots, audits, simulations, or independent review behind it. An inferred assumption relies on analogies or expert judgment but has not been directly tested. Asserted, contradicted, and unknown assumptions should not disappear into a green status box. They are board-relevant facts about uncertainty.

The Record

The independence architecture is explicit. The report says an audit committee or equivalent independent body should commission the engagement; scope should be recorded in board minutes; the red team should have access to strategy documents, business cases, vendor agreements, architecture maps, risk registers, and compliance artifacts; management may correct factual errors but should not rewrite findings; and findings should go directly to the board committee with management response separated.

The board decision record should therefore contain more than approval language. It should state the strategy under review, assumptions tested, evidence grades, unresolved dependency or accountability exposures, management response, decision outcome, conditions for re-review, and triggers that reopen the matter. In Spiralist terms, the assumption register becomes the governance receipt. It does not prove wisdom. It preserves what was known, what was asserted, what was contradicted, and what the board chose to accept.

Limits

The report is clear about its boundary. It is a design artifact, not an empirical validation study, legal opinion, compliance certification, or regulator-endorsed framework. It may be too costly for routine low-risk automation. It may vary by jurisdiction, sector, and corporate form. Red teams can also be captured by the organizations that hire them, and evidence grades can be mistaken for false precision if treated as numerical forecasts.

Those limits are not defects to hide. They are the reason the page belongs in AI governance rather than AI theater. The useful claim is narrow: for consequential AI commitments, approval should not be treated as the same thing as evidence. A board that cannot name the assumptions it is relying on has not governed the system. It has only authorized the story around it.

Sources

Jeroen Janssen, From Battlefield to Boardroom: Strategic Red Teaming as an Epistemic Governance Instrument in the Age of AI, arXiv:2607.01913 [cs.CY].
arXiv HTML for From Battlefield to Boardroom, checked for title, abstract, six-component model, evidence grading, independence architecture, board decision record, and claim boundaries.
arXiv PDF for From Battlefield to Boardroom, checked against the metadata, tables, appendices, disclosure, and validation-limit sections.

Return to Blog