The Proof Trace Becomes the Trust Boundary
Ben Slivinski and Michael Saldivar's July 2026 arXiv paper presents Theoria, a verification architecture for informal AI reasoning. The useful move is not another scalar judge. It is a typed proof trace where every state change must be licensed by an explicit justification.
For this essay, a proof-trace receipt is the record that lets someone inspect why an answer was certified: initial state, rewrite steps, typed justifications, evidence, judge verdicts, declined steps, assumptions, tool calls, and the boundary between certified and uncertified reliance.
The Claim
The paper, arXiv:2607.01223 [cs.AI; cs.CL; cs.LG; cs.LO; cs.SE], was submitted on July 1, 2026. arXiv lists the title as Theoria: Rewrite-Acceptability Verification over Informal Reasoning States.
The core claim is that trust in an AI answer should not rest on the answer alone, and should not rest on an opaque scalar from another LLM. Theoria asks a narrower question: can the candidate solution be rewritten into explicit reasoning states where every change is locally licensed?
That turns verification from a verdict into a boundary. A system does not have to answer everything. It has to distinguish outputs that survived a known checking procedure from outputs that did not.
The Paper Frame
Theoria is positioned between two familiar approaches. Formal proof assistants such as Lean, Coq/Rocq, and Isabelle offer strong guarantees once the problem is fully formalized, but many real questions fail at the translation boundary. A system can prove the wrong formal statement if the informal target was misunderstood.
Scalar LLM judges cover more ordinary prose, but a score is not a certificate. It does not show which premise was used, which transformation was licensed, or where an assumption entered. Theoria keeps informal language in play while forcing the reasoning into a witness format that is easier to audit.
The paper's useful phrase is rewrite acceptability. Given a before state, an after state, a justification type, and evidence, a judge asks whether the observed change is acceptable. That local task is different from asking whether the whole answer feels correct.
The Rewrite Witness
A Theoria witness starts with an initial state and then proceeds through state-to-state transformations. Each transformation has exactly one justification type: citation, computation, or problem_given. The judge is not asked to reconstruct a proof from free prose. It is asked whether the evidence licenses the exact diff between two states.
This matters because a reasoning trace becomes an object rather than a performance. It can be challenged step by step. A citation can fail to support the claimed rewrite. A computation can fail to produce the new value. A problem-given fact can be smuggled beyond what the problem actually stated.
The architecture also includes specialized judging, a pedantry filter, a convention lift for ordinary assumptions, and a certify-or-decline posture. The point is not to certify every answer. The point is to avoid converting unverifiable answers into false confidence.
Completeness of Change
The central invariant is completeness of change. Every difference between consecutive reasoning states must be accounted for. Hidden premises, fabricated citations, silent convention shifts, and unsupported computations should therefore appear as unlicensed mutations rather than passing as fluent prose.
This is a concrete answer to a recurring AI governance problem. A chain of thought may be visible without being accountable. A judge score may be scalable without being inspectable. A proof trace changes the unit of review: not "do I like this answer?" but "which exact changes were made, and what licensed them?"
Theoria does not make LLM judges infallible. It gives them smaller jobs and makes their misses easier to classify. That distinction matters: the architecture contributes exposure. It increases the chance that the relevant assumption or fabricated support is visible at all.
Empirical Results
On HLE-Verified Gold, a 185-problem text-only expert benchmark, Theoria certifies 105 answers at 91.4% strict precision, with a Wilson 95% confidence interval of 84.5% to 95.4%. The paper reports 56.8% coverage in that setting.
Holistic LLM judges reach comparable precision at matched coverage, but they fail on different problems. The reported Jaccard overlap between Theoria and holistic methods is only 0.14 to 0.36. That is a useful governance fact: different verifier architectures see different failure surfaces.
On 95 adversarial poisoned proofs across 15 domains, structured judges catch 94.7% of attacks versus 83.2% for holistic judging. The advantage concentrates exactly where the rewrite format should help: hidden premises, where the result is 90.6% versus 62.5%, and fabricated citations, where it is 100% versus 90%. On arithmetic and theorem-misapplication errors, where the format predicts no special advantage, performance is identical.
On GPQA Diamond, the paper reports 97.1% certified precision, 33 correct certifications out of 34, with 52.3% coverage. That supports a narrow reading: the method can produce high-precision certificates for a subset of answers, not a universal truth engine.
Governance Reading
The Spiralist reading is that Theoria treats verification as an institutional artifact. A final answer is not enough. A chain of thought is not enough. A scalar judge score is not enough. The record has to show which answer survived which local obligations and which steps were declined.
This page belongs beside AI Evaluations, Reasoning Models, Chain-of-Thought Monitorability, AI Safety Cases, The Evaluation Schema Becomes the Public Ledger, and The Grading Cascade Becomes the Evaluation Artifact. The shared question is whether the evaluation object can be inspected after the score is gone.
The trust product is not "AI says this is right." It is "this answer was certified under this witness format, with these justifications, these judge verdicts, these assumptions, these logs, and these known failure modes." That is a different product from a chatbot answer.
Proof-Trace Receipts
A proof-trace receipt should include: question, source materials, solver model, generated answer, initial state, rewrite states, state diffs, justification type for each transition, cited evidence, computation records, problem-given facts, judge model or rule, judge prompt or policy, pedantry-filter decision, convention-lift decision, repaired steps, declined steps, final certification status, confidence interval where available, known failure classes, and reviewer override record.
For scientific, legal, medical, financial, and engineering contexts, the receipt should also say what the certificate does not cover. The paper itself names extensions that would be needed for experimental science, such as measurement, statistical_inference, and model_assumption justification types.
The audit-grade sentence is: this output may be relied on only to the extent that this proof witness, these justifications, and these judge decisions license the state changes that lead to the conclusion.
Limits
Theoria can fail if the solver never proposes a correct answer, if the formalizer cannot express the reasoning in the rewrite format, if judges over-reject legitimate compression, if judges under-reject hidden premises or fabricated citations, if the pedantry filter suppresses a real issue, or if convention lift treats an ambiguous assumption as standard.
Two limits are especially important. First, if the formalizer rewrites too much of a state instead of changing only the relevant substring, the diff becomes harder to audit. Second, the three justification types are too coarse for some domains. Statistical reasoning, experimental science, and model-based inference need richer licenses than citation, computation, and problem_given.
Those limits are not footnotes. They define the trust boundary. The paper is strongest when read as an architecture for high-precision certification of a subset of informal reasoning, not as a general proof that an AI answer is true.
Source Discipline
This page treats Slivinski and Saldivar's paper as a July 2026 arXiv preprint and reads its quantitative results as author-reported evaluation evidence. It does not independently run Theoria, inspect the project repository, reproduce HLE-Verified Gold, rerun the adversarial poisoned-proof suite, or validate the GPQA Diamond results.
Use the paper to discipline claims about auditable reasoning. Do not use it to claim that LLM judges are solved, that chain-of-thought is sufficient, or that informal reasoning can always be certified. Its useful contribution is narrower: make the state changes explicit, license each change, and decline when the witness cannot carry the trust claim.
Related Pages
- AI Evaluations
- Reasoning Models
- Chain-of-Thought Monitorability
- Chain-of-Thought Prompting
- AI Safety Cases
- The Evaluation Schema Becomes the Public Ledger
- The Grading Cascade Becomes the Evaluation Artifact
- The Metacognitive Feedback Becomes the Uncertainty Ledger
- The Verifier Becomes the Reward Horizon
- The Static Structure Becomes the Agent Anchor
Sources
- Ben Slivinski and Michael Saldivar, Theoria: Rewrite-Acceptability Verification over Informal Reasoning States, arXiv:2607.01223 [cs.AI; cs.CL; cs.LG; cs.LO; cs.SE], submitted July 1, 2026.
- Primary arXiv versions checked: metadata API record, abstract page, HTML version, and PDF, reviewed for title, authorship, submission date, categories, rewrite-witness framing, completeness-of-change invariant, justification types, empirical results, adversarial poisoned-proof evaluation, GPQA Diamond evaluation, limitations, discussion, and AI disclosure.