AI Model Collapse: The Problem with AI
Source: theregister.com
AI Search Problems
I use AI for search because it's better than Google, especially Perplexity. But AI-enabled search is getting worse. When searching for data, like market-share statistics, the results come from bad sources. Instead of 10-K reports from the SEC, I get numbers from summaries of business reports that aren't quite right. This happens on all major AI search bots.
Understanding AI Model Collapse
This is known as AI model collapse. AI systems trained on their own outputs lose accuracy and reliability because errors compound. This leads to distorted data and performance defects. A Nature 2024 paper stated the model becomes poisoned with its own projection of reality.
Factors Contributing to Model Collapse
Model collapse results from error accumulation, where flaws amplify through generations. There is also a loss of tail data, where rare events are erased. Feedback loops reinforce narrow patterns, creating repetitive text. Aquant puts it this way: AI drifts away from reality when trained on its own outputs.
Retrieval-Augmented Generation (RAG) Issues
A Bloomberg Research study found that 11 leading LLMs using RAG produce bad results from harmful prompts. RAG uses external knowledge stores instead of just pre-trained knowledge, reducing AI hallucinations. However, it increases the chance of leaking private client data and creating misleading analyses. Amanda Stent said this has implications because RAG is used in customer support and question-answering systems, and users interact with it daily. AI practitioners need to use RAG responsibly.
The Downfall of AI
AI users create fake papers, from high school reports to scientific documents. The Chicago Sun-Times included non-existent novels in its best of summer feature. This accelerates AI becoming worthless. When I asked ChatGPT about the plot of Min Jin Lee's fake novel, it said there was no publicly available information. This is GIGO.
Some suggest mixing synthetic data with human-generated content to mitigate collapse. But given the choice between good content and AI slop, most will choose the latter. Businesses want to fire employees and increase profits, so quality is not a priority. We'll invest in AI until model collapse makes AI answers too bad to ignore. OpenAI's leader, Sam Altman, tweeted in February 2024 that OpenAI generates about 100 billion words per day. It won't take long.