← ATH

RESEARCH · GLOBAL

arXiv: Synthetic data market collapse threatens model fidelity; first unified microeconomic theory

New arXiv paper (q-fin) introduces Synthetic Data Contamination Equilibrium (SDCE)—first unified economic model of recursive LLM training on synthetic data. Proves that training on model-generated tokens causes irreversible distributional drift (model collapse), with measurable welfare loss.

WHY IT MATTERS

Model collapse risk is now economically formalized. BFSI vendors and banks using synthetic data for fine-tuning (compliance training, fraud patterns, etc.) face diminishing-return equilibria. Suggests data provenance verification and human-originated datasets command scarcity rents.

Source: arXiv q-fin · 2026-05-21

← BACK TO TODAY'S DECK

arXiv: Synthetic data market collapse threatens model fidelity; first unified microeconomic theory — ath