Blog · arXiv Analysis · Published: June 25, 2026

The Fuzzy Function Becomes the Neural Binary

Program-as-Weights asks developers to treat a small neural adapter as a program. That changes what has to be versioned, tested, inspected, and rolled back.

The Paper

The paper is Program-as-Weights: A Programming Paradigm for Fuzzy Functions, arXiv:2607.02512 [cs.LG, cs.AI, cs.CL]. The arXiv record lists Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, and Yuntian Deng as authors, with version 1 submitted on July 2, 2026. The 28-page PDF lists affiliations at the University of Waterloo, Cornell University, and Harvard University.

The paper is not just a model-compression paper. It asks what software looks like when a fuzzy task is compiled once into weights and then run locally like a function. That matters because many AI deployments already hide a remote model call behind an ordinary-looking function name.

The Fuzzy Function Problem

The authors use fuzzy functions for tasks that are easy to describe but hard to write as crisp rules: alerting on important log lines, repairing malformed JSON, ranking search results by intent, fuzzy matching, command interpretation, and other edge-case-heavy text operations. Developers often solve this by calling a large language model API every time the function runs. The paper argues that this brings cost, locality, reproducibility, and self-containment problems, especially when providers can update hosted models outside the codebase.

PAW changes the unit of work. Instead of treating the foundation model as a per-input solver, it treats the larger model as a compiler invoked at function-definition time. After compilation, each call runs through a small local interpreter and a task-specific adapter.

What PAW Builds

A PAW program has two parts. The discrete part is a pseudo-program: a cleaned-up natural-language restatement of the task with examples. The continuous part is a parameter-efficient module, implemented in the paper mainly as a LoRA adapter. The pseudo compiler is an off-the-shelf Qwen3-4B-Instruct-2507 model. The trained LoRA compiler is another 4B Qwen3 model that reads the specification and pseudo-program, then emits LoRA weights for a frozen interpreter.

The paper trains that compiler on FuzzyBench, a 10-million-example dataset generated from fuzzy-function specifications and input-output pairs. The dataset spans 29 thematic versions, more than 800 subcategories, and seven broad families including core text processing, search and web intelligence, custom classification, code and natural-language commands, safety and verification, agentic tool use, and format repair.

The runtime claim is deliberately software-like. A compiled PAW function can be downloaded, cached, versioned, loaded through a Python or JavaScript API, and executed without a network call after the first download. The public demo page describes the same product idea: define functions in English and run them locally.

What the Results Show

On the verified FuzzyBench test set, the paper reports that a Qwen3 0.6B interpreter running PAW programs reaches 73.78 percent exact match, compared with 68.70 percent for direct prompting of Qwen3-32B. The paper reports the memory comparison as roughly 1.2 GB at bf16 for the 0.6B interpreter versus roughly 60 GB for Qwen3-32B, or about 50x less inference memory. The same table keeps the ceiling visible: gpt-5.2 reaches 96.09 percent and gpt-5-mini reaches 91.87 percent.

The local-execution section reports quantized GGUF configurations. A Q6_K base plus Q4_0 LoRA is described as indistinguishable from bf16 within noise; a Q4_K_M base plus Q4_0 LoRA loses 1.3 points while cutting total disk to about 507 MB. On a MacBook M3 with Metal acceleration, the Q5_K_M base plus Q4_0 adapter runs at 31.6 tokens per second with a 0.48 second cold load.

The paper also tests image-conditioned fuzzy functions by swapping the compiler to Qwen3-VL-4B while keeping the same small text interpreter. It reports gains over 0.6B to 4B vision-language baselines on three diagram tasks, while also reporting a weakness on long-form image-to-LaTeX where the pseudo-program crowds the small interpreter context.

The Neural-Binary Receipt

If a neural adapter is treated as a program, it needs a software receipt. That receipt should name the original specification, pseudo-program, compiler checkpoint, interpreter checkpoint, PEFT type, adapter hash, quantization format, task family, training-data family, benchmark split, expected input shape, known failure modes, local runtime, and rollback rule.

The receipt should also say what remains opaque. A PAW adapter can be versioned like a binary, but its continuous weights are not source code. A review process that only reads the pseudo-program may miss behavior stored in the LoRA. Conversely, a process that only hashes the adapter may miss whether the pseudo-program smuggled in an ambiguous or overbroad task. Both halves matter.

Limits

The paper's limitations section is unusually relevant to governance. A trained PAW system couples a specific compiler to a specific interpreter family; switching interpreters requires retraining the compiler. The continuous PEFT component is opaque, so only the pseudo-program is directly human-inspectable. The evaluations are single-step input-output tasks, not validated multi-step or long-horizon reasoning systems. FuzzyBench is synthetic, generated by gpt-5.2, with broader external validation still in progress.

That makes PAW promising as an audit object, but not a shortcut around audit. The better claim is narrow: some fuzzy functions may be compiled into smaller local artifacts that are easier to cache, ship, and replay than a live cloud prompt. The remaining question is whether institutions will treat those artifacts as accountable software or as uninspectable convenience files.

Sources

Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, and Yuntian Deng, Program-as-Weights: A Programming Paradigm for Fuzzy Functions, arXiv:2607.02512 [cs.LG, cs.AI, cs.CL].
arXiv HTML for Program-as-Weights: A Programming Paradigm for Fuzzy Functions, checked for title, abstract, method, dataset, results, local execution, related work, limitations, and broader impacts.
arXiv PDF for Program-as-Weights: A Programming Paradigm for Fuzzy Functions, checked for title page, author metadata, FuzzyBench description, benchmark tables, quantization table, case studies, and limitations.
programasweights GitHub organization and ProgramAsWeights public demo page, checked for source availability, Python SDK metadata, license metadata, and the local-function product description.

Return to Blog