Blog · arXiv Analysis · Last reviewed June 25, 2026

The Repeated Test Becomes the Learning Debt

Yanru Guan, Naveen Raman, and Fei Fang study a narrow but important failure mode in AI assistance: when an AI keeps recommending the same observations, the human may become better at acting on the shown features while remaining undertrained on the correlated features that were never directly seen.

The Paper

The paper is Human Decision-Making with AI Assistance under Correlated Features, arXiv:2606.20628 [cs.AI; cs.CY; cs.GT], by Yanru Guan, Naveen Raman, and Fei Fang of Carnegie Mellon University. arXiv records version 1 as submitted on May 27, 2026. The paper studies an AI assistant that repeatedly recommends a subset of tests or features, while a human decision-maker learns feature coefficients from direct observation and then uses those learned estimates to make predictions.

The motivating example is medical: an AI may recommend diagnostic tests, and a clinician may learn from the observed results over time. The formal model is broader. At each round, the assistant can show at most k features out of n; the human observes only those features, imputes the missing ones, predicts a label, sees the true label, and improves coefficient estimates only for features that were directly shown.

The Learning Debt

The debt appears when a repeated recommendation is locally useful but pedagogically incomplete. If the AI always shows the same tests, the human can keep making decisions with those tests while never directly learning the coefficients for other features. Correlation makes that worse, because an unshown feature can still influence the human's prediction through imputation without being learned from direct exposure.

The paper contrasts this with prior work in which stationary policies, meaning policies that repeatedly recommend the same test set, are optimal when features are independent. Guan, Raman, and Fang show that independence is not a small technical detail. When the covariance matrix is not diagonal, their Theorem 3.1 proves that stationary policies can perform arbitrarily poorly, with the approximation ratio of the best stationary policy becoming unbounded as the discount factor approaches 1 under the constructed learning functions.

That result is a governance warning. A useful advice pattern can quietly become a training curriculum. The assistant is not merely choosing what the human sees today; it is shaping what the human will be able to understand tomorrow.

Correlation Changes Advice

The key shift is that correlated features carry information about each other. In the paper's pneumonia illustration, if an assistant always recommends chest X-rays and blood cultures, the doctor may never directly observe oxygen saturation readings and may fail to learn the correct coefficient for that feature. Later, the doctor may still account for oxygen saturation implicitly, but with a poor estimate of its importance.

This is not ordinary bandit exploration. The assistant is not exploring to learn the reward for itself. It is exploring to shape the human's learning under partial observability. In the paper's model, exploration is useful because direct exposure changes the human coefficient estimates; after enough exposure, the assistant can stop varying the tests and use the best long-run set.

Explore, Then Commit

The authors prove an eventually stationary structure. Under their continuity and Lipschitz assumption on the loss, an optimal policy has a dynamic prefix followed by a stationary suffix. In plain terms: show diverse tests early enough to improve human learning, then commit to a stable test set once the relevant learning has happened.

The paper also proves that the general test-selection problem is NP-hard and, unless P equals NP, has no fully polynomial-time approximation scheme. It then gives a dynamic-programming solution for finite horizons and a truncated-horizon approximation that plans for a shorter prefix and appends a stationary suffix.

The synthetic experiments match the theory. With n = 2, k = 1, squared loss, Gaussian covariates, and a discount factor of 0.99, the authors report that increasing feature correlation from 0.5 to 0.99 reduces retained performance for stationary policies from 93 percent to 53 percent in the plotted setting. They also report that higher correlation lengthens the dynamic exploration phase, and that a truncated dynamic program with horizon 120 retains at least 99 percent of optimal performance across the plotted curves while using at most 4 percent of the full horizon-600 runtime.

The Governance Receipt

The receipt for AI assistance should therefore include more than outcome quality. For any advice system that controls what a human observes, the institution should record the feature budget, the feature correlations assumed, which features were shown, which features were only imputed, the human learning model, the exploration schedule, the point of commitment, the evidence that direct exposure was enough, and the costs or risks of showing different features.

This belongs beside the site's concern with predictions becoming interventions and human-agent pairs becoming skill ratings. Advice changes the environment in which judgment forms. A system that recommends the same evidence again and again may look stable while it is creating a blind spot in the person who depends on it.

Limits

The paper is theoretical with synthetic experiments. Its model uses a uniform cardinality budget for tests and does not model heterogeneous test costs, delays, or risks. It assumes enough information to evaluate how test sets affect future human predictions, and it leaves formal regret and robustness guarantees for future work. The authors discuss extensions to nonlinear models, but note that general nonlinear relationships may require richer information about the conditional distribution of unseen features.

Those limits matter. A hospital, court, or workplace assistant would need real workflow evidence before turning this into policy. The durable lesson is narrower and stronger: when AI assistance shapes human exposure, repeated usefulness is not the same thing as human learning.

Sources


Return to Blog