Blog · arXiv Analysis · Last reviewed June 25, 2026

The Prospectus Becomes the Collateral Gate

A June 2026 arXiv paper by Serhii Hamotskyi, Akash Kumar Gautam, and Christian Hänig studies a narrow but institutionally heavy task: using LLMs to examine whether securities prospectuses satisfy collateral-eligibility criteria at the German Central Bank. Its lesson is that document interpretation becomes a gate only when evidence, model steps, and human review stay attached.

Fresh Angle

The paper is LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank, arXiv:2606.27316 [cs.CL], submitted June 25, 2026. It studies the use of large language models for securities-prospectus review, where the model helps determine whether an issued security can pass a collateral-eligibility gate.

This page is not a duplicate of the site's pages on transaction monitoring, financial-agent memory, or AI audit compliance. Those pages focus on suspicious-activity systems, agentic finance tooling, and audit interfaces. This paper is about a slower document problem: a prospectus, legal-financial criteria, and an institution deciding whether machine extraction can assist without turning missing evidence into approval.

Collateral Gate

The paper describes the Deutsche Bundesbank as Germany's central bank and a core Eurosystem member, responsible for implementing monetary policy and providing liquidity. In the setting studied, credit transactions must be backed by collateral, and a security's eligibility depends on legal and financial criteria meant to ensure that only high-quality assets are pledged. The document at the center of the workflow is the securities prospectus.

That document is a hard input. Prospectuses can run to hundreds of pages, thousands of securities may be issued annually, and relevant evidence may be scattered through semi-structured text, tables, footnotes, forms, and bilingual English-German layouts. The paper notes both parallel-column and interleaved bilingual text, plus OCR artifacts from PDF processing. For a model, this is a test of whether document understanding can preserve enough evidence for a regulated gate.

Generative Extraction

The authors contrast their approach with a 2023 decision-support system that treated the task as named entity recognition. The earlier span-based approach had two governance costs: it needed substantial manual annotation for relevant types, and it could be brittle when OCR noise, financial wording, or rigid text boundaries differed from the training examples. The 2026 paper moves toward a generative information-extraction pipeline.

The pipeline decomposes the task into extraction, normalization, and interpretation. The case study uses six criteria, all of which must be satisfied: currency, type of instrument, principal amount, redemption at maturity, coupon, and status. The first four are simpler criteria, while coupon and status can depend on master data such as asset type, issuer group, and issue date. The paper uses Llama-3.3-70B-Instruct and Cohere Command-R 08-2024 for inference, and Mistral Small 3.1 Instruct for evaluation.

One practical detail matters more than it first appears. The authors convert PDFs to Markdown using Docling after finding that prior extracted text contained private-use Unicode, inconsistent spacing, and artifacts that caused unreliable JSON behavior and repeated output. In a financial-document gate, conversion is not plumbing. It is part of the evidentiary chain.

Evaluation Shift

The dataset comes from the earlier securities-prospectus eligibility work: 413 prospectuses split into 268 training documents and 145 test documents. The test set is doubly annotated, resulting in 285 annotated document instances, with 82 ineligible. The paper reports 18 annotation types and explains that annotations mark supporting evidence rather than every possible mention. Two annotators may cite different locations for the same conclusion.

For that reason, the authors argue that location-based evaluation can punish a model for finding true evidence in a different place. They introduce value-based evaluation, combining fuzzy string matching with an LLM-as-a-judge setup that is instructed to handle OCR noise, formatting variation, language differences, and semantic equivalence. This does not make the judge infallible. It changes the evaluation target from "did the model copy the same span?" to "did the model extract the right value for the eligibility decision?"

Conservative Bias

The headline result is high precision at the document-eligibility level. The paper reports Command-R 08-2024 at 0.84 accuracy, 0.86 F1, 0.91 precision, and 0.82 recall. Llama-3.3-70B-Instruct is reported at 0.82 accuracy, 0.85 F1, 0.90 precision, and 0.80 recall. The prior 2023 system is listed at 0.60 accuracy, 0.72 F1, 0.70 precision, and 0.76 recall.

The authors frame the profile as conservative: a false negative on any one of the six criteria makes the whole document ineligible, and the system is tuned to minimize false acceptance of ineligible securities. They note that about 71 percent of the test-set prospectuses are eligible and that roughly 90 percent of securities predicted as eligible are truly valid under their evaluation. For governance, that is the right direction for a collateral gate, but it is not costless. Conservative false negatives become human review work, delay, and possible exclusion unless the workflow records why a document was flagged and how a reviewer can correct it.

Limits

The paper is a preprint and case study, not a production certification. The authors identify continuing limits in document conversion, especially for columns, tables, checkboxes, and other layout-heavy signals. They also report that larger prospectuses performed worse despite fitting within context limits, which points toward selective context and retrieval rather than simply feeding longer documents into a model.

The proposed future work is also a useful warning label. The authors discuss retrieval-augmented generation to reduce hallucination, connect reviewers to relevant spans, and reduce computation. They also call for meta-evaluation of LLM-as-a-judge methods, including checks for positional bias, length bias, and self-consistency. A financial institution cannot treat a judge model as an audit authority just because it is fluent at comparing strings.

Governance Standard

For Spiralism, the governance rule is a collateral-decision receipt. Each machine-assisted eligibility decision should retain the prospectus identifier, PDF hash, conversion method, OCR or Markdown artifacts, model names and versions, prompts, raw extracted values, normalized values, master-data inputs, criterion decisions, disagreement signals, reviewer assignment, overrides, and final rationale.

The separation of stages should remain visible. Extraction should not be silently merged with normalization. Normalization should not be silently merged with interpretation. Interpretation should not be silently merged with acceptance. A prospectus gate is defensible only when a reviewer can walk backward from the collateral decision to the textual evidence and the institutional rule being applied.

Sources

Serhii Hamotskyi, Akash Kumar Gautam, and Christian Hänig, LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank, arXiv:2606.27316 [cs.CL], submitted June 25, 2026.
arXiv HTML: LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank, reviewed for introduction, eligibility criteria, dataset, methods, evaluation design, results, limits, and future work.
arXiv PDF: LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank, checked against the arXiv record for title, authors, arXiv ID, date, category, abstract, and paper status.
Related pages: The Transaction Monitor Becomes the Suspicion Machine, The Financial Agent Memory Becomes the Audit Surface, The AI Audit Becomes the Compliance Interface, and AI Audit Trails.

Return to Blog