RESEARCH · GLOBAL
arXiv: Synthetic data market collapse threatens model fidelity; first unified microeconomic theory
New arXiv paper (q-fin) introduces Synthetic Data Contamination Equilibrium (SDCE)—first unified economic model of recursive LLM training on synthetic data. Proves that training on model-generated tokens causes irreversible distributional drift (model collapse), with measurable welfare loss.
WHY IT MATTERS
Model collapse risk is now economically formalized. BFSI vendors and banks using synthetic data for fine-tuning (compliance training, fraud patterns, etc.) face diminishing-return equilibria. Suggests data provenance verification and human-originated datasets command scarcity rents.
Source: arXiv q-fin · 2026-05-21