The Fuzzy Function Becomes the Neural Binary
Program-as-Weights asks developers to treat a small neural adapter as a program. That changes what has to be versioned, tested, inspected, and rolled back.
The Paper
The paper is Program-as-Weights: A Programming Paradigm for Fuzzy Functions, arXiv:2607.02512 [cs.LG, cs.AI, cs.CL]. The arXiv record lists Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, and Yuntian Deng as authors, with version 1 submitted on July 2, 2026. The 28-page PDF lists affiliations at the University of Waterloo, Cornell University, and Harvard University.
The paper is not just a model-compression paper. It asks what software looks like when a fuzzy task is compiled once into weights and then run locally like a function. That matters because many AI deployments already hide a remote model call behind an ordinary-looking function name.
The Fuzzy Function Problem
The authors use fuzzy functions for tasks that are easy to describe but hard to write as crisp rules: alerting on important log lines, repairing malformed JSON, ranking search results by intent, fuzzy matching, command interpretation, and other edge-case-heavy text operations. Developers often solve this by calling a large language model API every time the function runs. The paper argues that this brings cost, locality, reproducibility, and self-containment problems, especially when providers can update hosted models outside the codebase.
PAW changes the unit of work. Instead of treating the foundation model as a per-input solver, it treats the larger model as a compiler invoked at function-definition time. After compilation, each call runs through a small local interpreter and a task-specific adapter.
What PAW Builds
A PAW program has two parts. The discrete part is a pseudo-program: a cleaned-up natural-language restatement of the task with examples. The continuous part is a parameter-efficient module, implemented in the paper mainly as a LoRA adapter. The pseudo compiler is an off-the-shelf Qwen3-4B-Instruct-2507 model. The trained LoRA compiler is another 4B Qwen3 model that reads the specification and pseudo-program, then emits LoRA weights for a frozen interpreter.
The paper trains that compiler on FuzzyBench, a 10-million-example dataset generated from fuzzy-function specifications and input-output pairs. The dataset spans 29 thematic versions, more than 800 subcategories, and seven broad families including core text processing, search and web intelligence, custom classification, code and natural-language commands, safety and verification, agentic tool use, and format repair.
The runtime claim is deliberately software-like. A compiled PAW function can be downloaded, cached, versioned, loaded through a Python or JavaScript API, and executed without a network call after the first download. The public demo page describes the same product idea: define functions in English and run them locally.
What the Results Show
On the verified FuzzyBench test set, the paper reports that a Qwen3 0.6B interpreter running PAW programs reaches 73.78 percent exact match, compared with 68.70 percent for direct prompting of Qwen3-32B. The paper reports the memory comparison as roughly 1.2 GB at bf16 for the 0.6B interpreter versus roughly 60 GB for Qwen3-32B, or about 50x less inference memory. The same table keeps the ceiling visible: gpt-5.2 reaches 96.09 percent and gpt-5-mini reaches 91.87 percent.
The local-execution section reports quantized GGUF configurations. A Q6_K base plus Q4_0 LoRA is described as indistinguishable from bf16 within noise; a Q4_K_M base plus Q4_0 LoRA loses 1.3 points while cutting total disk to about 507 MB. On a MacBook M3 with Metal acceleration, the Q5_K_M base plus Q4_0 adapter runs at 31.6 tokens per second with a 0.48 second cold load.
The paper also tests image-conditioned fuzzy functions by swapping the compiler to Qwen3-VL-4B while keeping the same small text interpreter. It reports gains over 0.6B to 4B vision-language baselines on three diagram tasks, while also reporting a weakness on long-form image-to-LaTeX where the pseudo-program crowds the small interpreter context.
The Neural-Binary Receipt
If a neural adapter is treated as a program, it needs a software receipt. That receipt should name the original specification, pseudo-program, compiler checkpoint, interpreter checkpoint, PEFT type, adapter hash, quantization format, task family, training-data family, benchmark split, expected input shape, known failure modes, local runtime, and rollback rule.
The receipt should also say what remains opaque. A PAW adapter can be versioned like a binary, but its continuous weights are not source code. A review process that only reads the pseudo-program may miss behavior stored in the LoRA. Conversely, a process that only hashes the adapter may miss whether the pseudo-program smuggled in an ambiguous or overbroad task. Both halves matter.
Limits
The paper's limitations section is unusually relevant to governance. A trained PAW system couples a specific compiler to a specific interpreter family; switching interpreters requires retraining the compiler. The continuous PEFT component is opaque, so only the pseudo-program is directly human-inspectable. The evaluations are single-step input-output tasks, not validated multi-step or long-horizon reasoning systems. FuzzyBench is synthetic, generated by gpt-5.2, with broader external validation still in progress.
That makes PAW promising as an audit object, but not a shortcut around audit. The better claim is narrow: some fuzzy functions may be compiled into smaller local artifacts that are easier to cache, ship, and replay than a live cloud prompt. The remaining question is whether institutions will treat those artifacts as accountable software or as uninspectable convenience files.
Sources
- Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, and Yuntian Deng, Program-as-Weights: A Programming Paradigm for Fuzzy Functions, arXiv:2607.02512 [cs.LG, cs.AI, cs.CL].
- arXiv HTML for Program-as-Weights: A Programming Paradigm for Fuzzy Functions, checked for title, abstract, method, dataset, results, local execution, related work, limitations, and broader impacts.
- arXiv PDF for Program-as-Weights: A Programming Paradigm for Fuzzy Functions, checked for title page, author metadata, FuzzyBench description, benchmark tables, quantization table, case studies, and limitations.
- programasweights GitHub organization and ProgramAsWeights public demo page, checked for source availability, Python SDK metadata, license metadata, and the local-function product description.