Blog · arXiv Analysis · Published: June 25, 2026

The AI Label Becomes the Public Record

When a public agency says it uses AI, the label is too broad to be evidence. The system type, affordance, vendor boundary, and action surface have to be part of the record.

The Paper

The paper is Jonathan Rystrøm, Chris Schmitz, Nathan Davies, Gerhard Hammerschmid, Albert Meijer, and Chris Russell's A Technical Typology of AI Systems in Public Administration, arXiv:2606.31755 [cs.CY, cs.AI]. The arXiv record lists version 1 as submitted on June 30, 2026, with the comment "Under Review." The PDF is 30 pages, and the title page lists affiliations with the University of Oxford, Hertie School, Utrecht University, and Harvard University.

The paper starts from a problem that public agencies already face. "AI" can mean a fixed eligibility rule, a transparent statistical model, an opaque predictive model, a general-purpose chatbot, or an agent connected to tools and records. Those systems do not create the same problems for accountability, procedural justice, non-discrimination, privacy, or service delivery. Treating them as one thing makes public administration research and procurement less precise than the systems require.

What the Typology Separates

The authors propose five categories: hand-coded, glass-box, black-box, general-purpose, and agentic systems. A hand-coded system encodes rules directly. A glass-box system learns from data but remains inspectable by experts. A black-box system learns logic that resists meaningful inspection. A general-purpose system is pre-trained for broad capabilities and adapted downstream. An agentic system scaffolds a general-purpose system so it can act over time, often through tools or APIs.

The value of the typology is not taxonomic neatness. It is scope control. A black-box risk model raises different contestation problems than a hand-coded benefits calculator. A general-purpose model introduces provider dependence, cloud infrastructure, training-data opacity, and transfer-risk questions that a task-specific model may not. An agentic system moves the governance question from a single output to a process that may query records, take steps, and require runtime oversight.

The Literature Audit

The paper tests the typology against public-administration research. The authors use OpenAlex to identify influential public administration and digital government papers on AI, published between 2019 and 2025. After screening, the final corpus contains 91 papers. The coding unit is a "strand": a compact claim about what a paper studies, how it motivates the work, or how it states conclusions. Gemini 3.1 Flash-Lite Preview is used to extract preliminary candidate strands, but the paper says all classification and outcome judgments are made by human coders.

The headline results are not subtle. Of 91 coded papers, 50, or 55 percent, do not provide enough information to classify the studied system. The paper reports that 31 percent of papers mischaracterise AI systems by motivating the work with a different system type than the one studied, and that 41 percent overgeneralise by drawing claims broader than the empirical system supports. Among fully specified empirical papers, black-box systems are the most common category. Only 11 papers explicitly analyse general-purpose systems, and no paper in the corpus is classified as empirically studying contemporary agentic systems.

Why Government Should Care

This matters because government use gives ordinary system labels public consequences. A citizen denied a service does not need the phrase "AI-powered." They need to know what kind of system affected the file, what role a public official retained, what data the system used, whether its logic can be inspected, which vendor or model supplied the capability, and whether the system merely scored a case or acted across records.

The paper also gives a useful correction to both hype and resistance. More technical detail is not always better. The authors argue for detail at the level where affordances change. Public administrators usually do not need to know every neural-network architecture choice to assess a service chatbot. They do need to know whether it is a hand-coded conversation tree, a task-specific model, a general-purpose model with an external provider, or an agent allowed to call tools.

The System-Type Receipt

A system-type receipt should include the system category, the task, the input and output types, the model or vendor if there is one, whether rules are authored or learned, whether expert inspection is possible, whether the system is general-purpose, whether it can act over time, what tools or records it can access, how performance was locally tested, what human review remains, and what uncertainty remains about the technical setup.

That receipt protects research claims, procurement decisions, and citizen remedies. It prevents a study of an old rule-based calculator from being generalized to agentic service delivery. It prevents a vendor benchmark for a general-purpose model from standing in for local administrative testing. It gives auditors a way to ask whether the public record describes the system citizens actually encountered.

Limits

The authors are explicit about limits. Their sample is the most highly cited public administration and digital government scholarship on AI from 2019 to 2025, not the entire field. Citation-weighted sampling may over-represent conceptual and review work and under-represent recent applied work, which may partly explain the low coverage of general-purpose and agentic systems. The coding procedure is conservative but still fallible. The typology also has to be updated as systems change, especially around degrees of agenticness, continual learning, and embodiment.

Those limits make the paper more useful, not less. Its strongest claim is a governance habit: never let the word "AI" do the work of a system description.

Sources


Return to Blog