Experimental evidence of progressive ChatGPT models self-convergence

arXiv cs.AI / 3/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates how recursive training with synthetic data can cause "model self-convergence," lowering the diversity of outputs across newer ChatGPT releases.
It uses a text-similarity metric to quantify output diversity and compares multiple ChatGPT versions over time, finding a measurable decline even when temperatures are set to 1.
The authors attribute the diversity loss to increasing amounts of synthetic data in training sets, potentially due to LLM-generated content permeating the internet.
They coin the term "model self-convergence" to describe the rising similarity of outputs across model versions as this longitudinal effect unfolds.

Abstract

Large Language Models (LLMs) that undergo recursive training on synthetically generated data are susceptible to model collapse, a phenomenon marked by the generation of meaningless output. Existing research has examined this issue from either theoretical or empirical perspectives, often focusing on a single model trained recursively on its own outputs. While prior studies have cautioned against the potential degradation of LLM output quality under such conditions, no longitudinal investigation has yet been conducted to assess this effect over time. In this study, we employ a text similarity metric to evaluate different ChatGPT models' capacity to generate diverse textual outputs. Our findings indicate a measurable decline of recent ChatGPT releases' ability to produce varied text, even when explicitly prompted to do so, by setting the temperature parameter to one. The observed reduction in output diversity may be attributed to the influence of the amounts of synthetic data incorporated within their training datasets as the result of internet infiltration by LLM generated data. The phenomenon is defined as model self-convergence because of the gradual increase of similarities of produced texts among different ChatGPT versions.

【無料版】まじん式 v4

note

【無料版】まじん式 v4

note

日本で先端チップ設計の芽、マイニング用途で中国独走に待った

日経XTECH

xAIのGrokが児童性的虐待コンテンツを生成したとして集団訴訟 3人のティーンが安全対策の不備を主張

Ledge.ai

GDELT、100以上の言語で世界のニュースを収集する情報基盤 1979年以降のデータを基にAIで報道動向を可視化

Ledge.ai

Experimental evidence of progressive ChatGPT models self-convergence

Key Points

Abstract

Related Articles

【無料版】まじん式 v4

【無料版】まじん式 v4

日本で先端チップ設計の芽、マイニング用途で中国独走に待った

xAIのGrokが児童性的虐待コンテンツを生成したとして集団訴訟 3人のティーンが安全対策の不備を主張

GDELT、100以上の言語で世界のニュースを収集する情報基盤 1979年以降のデータを基にAIで報道動向を可視化

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer