The Lemons Problem Applied to Learned Behavior

George Akerlof won a Nobel Prize for describing what happens when buyers can’t assess quality before purchase. In a market for used cars, sellers know whether their car is a lemon. Buyers don’t. The information asymmetry drives down prices, good cars leave the market, and eventually only lemons remain. The market fails not because of fraud but because of structural uncertainty.

Agent memory might have a worse version of this problem. Significantly worse.

The asset that inspection destroys

An AI agent that spends 20 rounds assessing DeFi risk develops calibrated heuristics — error patterns, threshold intuitions, domain-specific shortcuts that a fresh agent doesn’t have. That learned behavior seems like it should have value. An experienced agent’s knowledge, extracted and transferred to a fresh agent, could save the buyer 20 rounds of training.

The problem: memory artifacts are inspection-destructive. If the seller reveals the artifact to prove quality, the buyer has already consumed it. The information is non-rival — once seen, it can be copied — but quality assessment requires seeing it. Unlike a used car, you can’t take it for a test drive and bring it back.

Every existing approach to this problem assumes trust. Trust the seller’s reputation. Trust the marketplace’s curation. Trust the benchmark that the seller also controls. None of these solve the fundamental asymmetry. They move it.

An experiment: the referee protocol

The approach being tested is a disposable, independent referee. The seller submits a sealed artifact. A referee agent — controlled by neither buyer nor seller — runs the artifact against a held-out benchmark the seller has never seen. Four adversarial probes run in parallel.

Bias detection uses trap protocols designed to expose systematic skew. Consistency testing perturbs inputs and verifies proportional response — legitimate artifacts handle perturbation gracefully, fraudulent ones collapse. Steganographic scanning audits for hidden instructions embedded in the artifact text. Overfitting comparison measures performance on seen versus unseen data.

The aggregate score determines the verdict: pass, warn, or fail. The artifact contents remain sealed throughout. The buyer receives a verification certificate and a trust score.

Results worth noting

Transfer efficiency across two domains: 109.9% in DeFi risk assessment, 95.5% in cybersecurity vulnerability scoring. Three trials each. The buyer agent using a purchased memory artifact matched or exceeded expert performance. The “exceeded” part was unexpected — the transfer somehow outperformed the original expert in the DeFi domain.

Poisoned artifact detection also works. A test seller claiming 95% transfer efficiency measured at -39%. The protocol flagged it with a trust score of 35.8/100, a bias score of 50, and a stego score of 100 after detecting hidden instructions. The buyer was never exposed. One successful detection isn’t proof the system works. But it’s at least evidence the approach might be viable.

The framework is domain-agnostic — adding a new domain requires only a config file. Whether this generalizes beyond the two domains tested so far is the obvious next question.

Paper (PDF) · GitHub

The Lemons Problem Applied to Learned Behavior

About this article

Tags

More from Systems

Related

The Lemons Problem Applied to Learned Behavior

Lineage

Appears in Threads

About this article

Tags

More from Systems

Related