The Solver Route Becomes the Decision Receipt
Chuanhao Li, Xiaoan Xu, Dirk Bergemann, Ethan X. Fang, Yehua Wei, and Zhuoran Yang's June 2026 arXiv paper on COOPA turns optimization agents into a governance question: when an AI system translates a messy operational problem into a mathematical model, the route to the solver has to be reviewable.
The Paper
The paper is COOPA: A Modular LLM Agent Architecture for Operations Research Problems, arXiv:2606.27611 [cs.LG]. arXiv lists it as submitted on June 25, 2026. The authors frame operations research as a high-stakes decision-support discipline used in supply chains, transportation, energy, healthcare, finance, engineering, and public policy.
The site already has pages on causal caution in decision support, reward weights as governance levers, policy review engines, and agent workflow gates. COOPA adds a narrower object: the optimization formulation itself. Before an answer can be right, the model has to choose the right variables, parameters, objective, constraints, and solver family.
Formulation Error
The paper's strongest governance point is that executable code can still encode the wrong problem. A generated optimization script may run cleanly, produce an objective value, and look professional while misreading demand, capacity, fairness, timing, safety, or cost. In operations research, that error is not cosmetic. It changes the decision space.
COOPA is built around this bottleneck. The authors argue that many LLM optimization systems emphasize code generation or downstream execution feedback, while the harder failure can happen earlier, when natural language is translated into a model. Solver success then becomes a false comfort: the solver solved the model it received, not necessarily the problem the institution meant to pose.
That makes source traceability more than interpretability theater. If a constraint appears in the formulation, a reviewer should be able to see which sentence or data field supported it. If an objective was chosen, the record should show why that objective was selected rather than a neighboring one. A warehouse, clinic, grid operator, or public agency needs to inspect the abstraction, not only the answer.
Candidate Loop
COOPA generates multiple candidate formulations, evaluates each across modeling dimensions, and selects using a max-min confidence rule: the chosen candidate is the one whose weakest modeling dimension is strongest. The paper's default setting uses three candidates. This is a practical response to a familiar agent failure. A single fluent formulation can hide a missing constraint; comparison gives the system a chance to expose its own alternatives before code generation.
Across three benchmarks, eight LLM backbones, and four baselines, the paper reports that COOPA achieved the best macro-average accuracy on six of eight backbones. The strongest reported macro-average scores were 70.6 percent with GPT-5.2, 69.4 percent with GPT-5, and 68.4 percent with Gemini-3-Flash. In the authors' ablation, iterative confidence-based modeling raised the cross-model mean from 61.8 percent to 64.8 percent, and improved accuracy on seven of eight backbones.
The useful lesson is not that confidence scores are truth. It is that confidence can be made into a review artifact. The paper reports a positive relationship between confidence gain and accuracy gain, but also treats confidence reliability as a limitation. A governance process should therefore preserve the candidates, scores, explanations, and rejected alternatives rather than exposing only the selected formulation.
Solver Route
COOPA separates the manager agent from optimizer agents. The repository README describes a manager that extracts a structured formulation and delegates to specialized optimizer agents: mathematical or algebraic optimization with Pyomo and GLPK/IPOPT, combinatorial optimization with Google OR-Tools, metaheuristic optimization with pymoo, or general-purpose Python. The paper describes the same routing idea as multi-solver dispatch.
This matters because solver choice is a policy choice in technical clothing. A linear-programming route, constraint-programming route, metaheuristic route, or general Python route can imply different assumptions about convexity, optimality, runtime, explainability, feasible regions, and tradeoff surfaces. In one appendix case, the paper uses multi-objective engineering design to show why a Pareto-front route can be more appropriate than forcing the task into a single mathematical-programming framing.
The solver route should therefore be visible in any serious decision-support record. It tells a reviewer whether the system treated the problem as a linear program, mixed-integer program, combinatorial search, multi-objective optimization, simulation, or ad hoc numerical task.
Decision Receipt
A COOPA-style optimization assistant should leave a decision-support receipt. At minimum, it should preserve the original problem text, extracted variables, parameters, objectives, constraints, source spans, candidate formulations, confidence scores, rejected candidates, selected solver route, solver package and version, generated code, execution logs, objective value, feasibility status, timeout status, and human reviewer disposition.
The receipt should also record what the model abstracted away. Many real decisions depend on constraints that are politically or ethically important but poorly specified: service equity, labor burden, environmental exposure, safety margin, redundancy, robustness, and contestability. If those do not enter the formulation, their absence should be visible before the result becomes a schedule, route, staffing plan, budget, or procurement recommendation.
Result Boundary
The paper's results are useful because they are not presented as autonomy. COOPA is explicitly framed as decision support. The benchmark suite includes ComplexLP, IndustryOR, and BWOR; the README says BWOR uses 80 scored problems because two lack ground truth. The system also publishes experiment logs, extracted formulations, and per-problem generated solver code through linked artifacts.
Those are the right instincts for this class of tool. Optimization agents should not be judged only by final objective accuracy. They should be judged by whether a qualified human can inspect the abstraction, reproduce the solver path, detect missing constraints, and understand what tradeoffs the mathematical model made impossible to see.
Claim Boundary
The paper names important limits. Multi-solver dispatch is the least empirically validated component because 91.1 percent of benchmark calls went to the mathematical optimizer, reflecting the LP/MILP emphasis of current benchmarks. Source traceability is demonstrated architecturally, but the paper does not yet measure whether it improves human verification speed or error detection. The cost study reports COOPA as more expensive than single-pass methods, including higher token use and more API calls.
That boundary is not a weakness to hide. It is the page's reason for existing. If an optimization agent is used in logistics, energy, manufacturing, healthcare, or public-service planning, the governance object is not the answer. It is the chain from natural-language problem to formulated model to solver route to reviewed recommendation.
Sources
- Chuanhao Li, Xiaoan Xu, Dirk Bergemann, Ethan X. Fang, Yehua Wei, and Zhuoran Yang, COOPA: A Modular LLM Agent Architecture for Operations Research Problems, arXiv:2606.27611 [cs.LG], submitted June 25, 2026, DOI 10.48550/arXiv.2606.27611.
- arXiv PDF and HTML for COOPA: A Modular LLM Agent Architecture for Operations Research Problems, reviewed for the architecture, benchmarks, results, ablations, solver-dispatch statistics, cost study, potential impacts, and limitations.
- Project repository: xxxxxa-hub/COOPA, reviewed for the manager/optimizer workflow, supported optimizer-agent families, benchmark names, experiment artifacts, and run instructions.