Don't Stop the Multi-Party! On Generating Synthetic Written Multi-Party Conversations with Constraints

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses privacy risks and platform-specific biases in real Written Multi-Party Conversation (WMPC) datasets, proposing synthetic WMPC generation as an alternative.
  • It explores generating synthetic WMPCs using instruction-tuned LLMs under deterministic constraints covering dialogue structure and participants’ stances.
  • Two generation strategies are evaluated: having the LLM generate an entire WMPC in one shot versus generating the conversation turn-by-turn as individual parties given the history.
  • The authors introduce an analytical evaluation framework that measures constraint compliance, content quality, and interaction complexity, using both human and LLM-as-a-judge assessments.
  • Results show significant model-dependent differences, and turn-by-turn generation achieves better constraint adherence and greater linguistic variability, while both approaches can produce high-quality WMPCs.

Abstract

Written Multi-Party Conversations (WMPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility. However, these datasets raise privacy concerns and often reflect platform-specific properties. For example, interactions between speakers may be limited due to rigid platform structures (e.g., threads, tree-like discussions), which yield overly simplistic interaction patterns (e.g., one-to-one "reply-to" links). This work explores the feasibility of generating synthetic WMPCs with instruction-tuned Large Language Models (LLMs) by providing deterministic constraints such as dialogue structure and participants' stance. We investigate two complementary strategies of leveraging LLMs in this context: (i.) LLMs as WMPC generators, where we task the LLM to generate a whole WMPC at once and (ii.) LLMs as WMPC parties, where the LLM generates one turn of the conversation at a time (made of speaker, addressee and message), provided the conversation history. We next introduce an analytical framework to evaluate compliance with the constraints, content quality, and interaction complexity for both strategies. Finally, we assess the level of obtained WMPCs via human and LLM-as-a-judge evaluations. We find stark differences among LLMs, with only some being able to generate high-quality WMPCs. We also find that turn-by-turn generation yields better conformance to constraints and higher linguistic variability than generating WMPCs in one pass. Nonetheless, our structural and qualitative evaluation indicates that both generation strategies can yield high-quality WMPCs.