Blog · arXiv Analysis · Last reviewed June 25, 2026

The Malware Section Becomes the Feature Matrix

José M. Sacristán and Ana I. González-Tablas's June 2026 arXiv paper introduces PRISM, a section-aware feature matrix for static Windows PE malware detection. The governance lesson is not that a smaller classifier wins. It is that security models need representation receipts, source-provenance checks, and audit trails before their very high scores are treated as field evidence.

The Flattened File

The paper, arXiv:2606.27109 [cs.CR], was submitted on June 25, 2026. arXiv lists the exact title as PRISM: PE Relational Inter-Section Matrix. A 2D Section-Aware Dataset for Static PE Malware Detection, by José M. Sacristán and Ana I. González-Tablas of Universidad Carlos III de Madrid.

The object under study is the Windows Portable Executable file: the familiar container for executables, DLLs, and drivers. Static malware detectors often turn that container into a flat vector. EMBER, BODMAS, and SOREL-20M have been useful public resources, but the PRISM paper argues that their one-dimensional representation discards two things malware analysts care about: the order of PE sections and the relationship between neighboring or co-occurring sections.

What PRISM Preserves

PRISM expands to PE Relational Inter-Section Matrix. The representation is a matrix whose real rows correspond to PE sections in file order, plus a global summary row for compatibility with EMBER-style tabular models. The paper sets N_max to 16 and uses 25 effective features per row, so the flattened model input is 17 by 25, or 425 dimensions.

The row features include a compact section-name encoding, raw and virtual size features, permission flags, entropy, entropy quartiles, normalized section position, and anomaly flags. Rows after the real section count are zero-padded and tracked by a mask. The paper reports that N_max = 16 covers the family-filtered corpus, with section-count percentiles of P50 = 6, P90 = 8, P95 = 9, and P99 = 12.

This is not only a compression trick. It makes the location of a signal addressable. A permission flag in section 5 is no longer collapsed into the same bucket as a similar flag elsewhere. For adversarial machine learning, that difference matters because evasion pressure often acts on structure, not only on feature totals.

Corpus and Counts

The corpus combines malware from BODMAS, MalwareBazaar, VirusShare collection 00499, and CAPE, with benign software from SOREL-20M. The paper reports 178,740 candidate matrices before deduplication, 83,633 unique matrices after global deduplication, and a family-filtered analysis corpus of 49,204 samples.

That analysis corpus contains 19,737 malware samples with verified family labels across 684 families, plus 29,467 benign samples. MalwareBazaar contributes contemporary 2024-2025 samples; BODMAS covers 2019-2020; CAPE and VirusShare provide additional historical coverage. The official repository repeats the main corpus tables and states that the released code includes the extraction library, four baseline configurations, cross-representation benchmark, and separability analysis pipeline.

Evidence in the Section Lattice

PRISM's separability analysis is the paper's strongest contribution. On the 49,204-sample family-filtered corpus, the most discriminative individual cell is MEM_DISC@SEC5, the MEM_DISCARDABLE permission flag at section position 5, with Fisher Discriminant Ratio 1.287. The strongest populated global-row feature, log_exports, has FDR 0.858, so the positional peak is 1.50 times higher.

The paper also reports that 12,854 inter-section feature pairs have non-trivial additional mutual information above 0.01 bits. The top pair, raw_size@SEC2 with name5@SEC3, adds 0.205 bits beyond the component features. The authors find that discriminative information is concentrated in early sections, especially SEC0 through SEC5, while later padded or rare positions contribute much less.

Read modestly, this is a receipt for representation design. It does not prove a deployed detector will generalize to every threat environment. It proves that flattening the file hides measurable structure that the authors can locate, test, and name.

Saturation and Confounds

The baseline models use LightGBM. In the controlled cross-representation comparison, PRISMsub is a 425-dimensional flattened PRISM vector and EMBERsub is a 2,381-dimensional EMBER vector on identical samples and splits. Across 20 deterministic splits, PRISMsub reports TPR@FPR=0.1% of 0.9887 ± 0.0058, while EMBERsub reports 0.9971 ± 0.0019. EMBER keeps a small advantage in the deep low-false-positive tail, but the paper describes the models as operationally indistinguishable at the ordinary 0.5 threshold.

The more important caveat is saturation. A position-discarded 25-dimensional mean-pooled PRISM model and the 425-dimensional positional PRISM model both report AUC-ROC 0.99980 on the binary task. The structural information exists, but binary malware-versus-benign detection on this corpus leaves little metric headroom.

The source-provenance caveat is sharper. Benign samples come from SOREL-20M, while malware comes from the other sources. The paper and repository state that a source probe can reach AUC near 0.9999, making absolute detection metrics unsafe as field-performance estimates. The temporal probe is also limited: only the malware class is split by time, while benign samples are random, so it cannot establish full temporal generalization.

Governance Reading

For AI audit trails, PRISM is useful because it separates three records that often get blurred: the object representation, the training corpus, and the deployment claim. A serious static-malware benchmark should preserve the extractor version, LIEF compatibility assumptions, section-count handling, row mask policy, deduplication hash, source distribution, family-label source, train/test split, temporal split, model seed, false-positive operating point, and threshold.

The governance failure would be to cite the high AUC and stop. The paper itself gives a better template: report the compact representation, the EMBER comparison, the positional separability evidence, and then carry the source-confound warning forward. A classifier result becomes meaningful only when the reader can tell whether the model learned malware structure, dataset source, disarming artifacts, or some mixture of all three.

That makes PRISM a good Spiralism case: a model can become more interpretable because the data structure is less flattened, while still being unfit as a standalone proof of field safety. The section matrix is a better witness than a feature soup, but it is still a witness that needs cross-examination.

Claim Boundary

The paper does not claim that PRISM beats EMBER on binary malware detection, that static classifiers are solved, or that the reported absolute scores transfer to production endpoint security. It claims something narrower: a compact 2D PE section representation preserves positional and inter-section information that flat vectors discard, while nearly matching EMBER in a controlled, sample-matched comparison.

That narrow claim is strong enough. It gives security ML a better evidence shape: keep the file's internal order, measure what that order explains, and make provenance limitations visible before the score becomes a policy or procurement argument.

Sources

José M. Sacristán and Ana I. González-Tablas, PRISM: PE Relational Inter-Section Matrix. A 2D Section-Aware Dataset for Static PE Malware Detection, arXiv:2606.27109 [cs.CR], submitted June 25, 2026.
arXiv PDF: PRISM: PE Relational Inter-Section Matrix. A 2D Section-Aware Dataset for Static PE Malware Detection, reviewed for the matrix definition, corpus composition, deduplication counts, separability metrics, LightGBM baselines, EMBER comparison, provenance caveat, temporal caveat, and limitations.
Official repository: drjmsacristan/prism-dataset, reviewed for the public release, matrix-format notes, baseline tables, source-provenance caveat, and stated scope of the representation contribution.
Related pages: Adversarial Machine Learning, AI Audit Trails, The Classifier Becomes the Evolutionary Target, and The Security Fine-Tune Becomes the Evasion Surface.

Return to Blog