Comparing Natural and Synthetic Structured Data: A Study of the Passive Verb Alternation in French and Italian

arXiv cs.CL / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates how natural versus synthetic structured datasets affect large language model learning and evaluation, using passive verb alternation in French and Italian as a test case.
It employs Blackbird Language Matrices (BLMs) with structured templates instantiated either from natural sentences (sourced from Universal Dependencies) or from synthetic sentence generation.
Models trained and evaluated on synthetic datasets reach near “ceiling” performance but fail to reliably generalize to natural sentences.
Conversely, models trained on natural data perform robustly across both natural and synthetic test suites, indicating stronger capture of abstract linguistic patterns.
The authors argue the findings support the value of natural data and structured evaluation setups for probing LLMs’ syntactic and semantic knowledge.

Abstract

This study compares the impact of natural and synthetic data on training and evaluating large language models (LLMs), using the case of passive verb alternation in French and Italian. We use Blackbird Language Matrices (BLMs), structured datasets designed to probe linguistic knowledge of underlying patterns across sentence sets. We compare structured templates instantiated with natural sentences extracted from Universal Dependencies to structured templates of synthetic sentences. Experiments show that while models achieve ceiling performance when trained and tested on synthetic datasets, they do not reliably generalize to natural sentences. In contrast, models trained on natural data exhibit robust performance across both natural and synthetic test suites, demonstrating their superior ability to capture abstract linguistic patterns. These results corroborate the value of natural data and of structured set ups in linguistic evaluation for probing LLMs' syntactic and semantic knowledge.

Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.

Dev.to

The Redline Economy

Dev.to

$500 GPU outperforms Claude Sonnet on coding benchmarks

Dev.to

From Scattershot to Sniper: AI for Hyper-Personalized Media Lists

Dev.to

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure

Dev.to

Comparing Natural and Synthetic Structured Data: A Study of the Passive Verb Alternation in French and Italian

Key Points

Abstract

Related Articles

Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.

The Redline Economy

$500 GPU outperforms Claude Sonnet on coding benchmarks

From Scattershot to Sniper: AI for Hyper-Personalized Media Lists

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer