RESEARCH · GLOBAL
arXiv: Model collapse in synthetic data markets poses risk to training data supply chains
Researchers formalize 'model collapse'—irreversible loss of data quality when models train on synthetic data from previous generations—as a microeconomic equilibrium problem. They propose subsidy structures to mitigate contamination.
WHY IT MATTERS
If training-data supply chains become polluted with lower-quality synthetic content, LLM performance could degrade across BFSI use cases (compliance, risk modeling, fraud detection). Affects long-term viability of cost-cutting via synthetic data.
Source: arXiv · 2026-05-21