Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Policy-Guided Hybrid Simulation (PGHS) to diagnose Meituan merchant strategies using group-level user behavior simulation without costly online experiments.
  • It addresses two simulator trust issues: incomplete information that can lead reasoning-based models to over-rationalize, and “mechanism duality” requiring both interpretable preferences and implicit statistical regularities.
  • PGHS uses a shared alignment layer built from transferable decision policies, combining an LLM-based reasoning branch (to reduce over-rationalization) with an ML-based fitting branch (to capture implicit regularities).
  • Predictions from both branches are fused to provide complementary correction, achieving an 8.80% group simulation error on Meituan data (101 merchants, 26,000+ trajectories).
  • The approach outperforms the best reasoning-only and fitting-only baselines by large margins (45.8% and 40.9%, respectively).

Abstract

Simulating group-level user behavior enables scalable counterfactual evaluation of merchant strategies without costly online experiments. However, building a trustworthy simulator faces two structural challenges. First, information incompleteness causes reasoning-based simulators to over-rationalize when unobserved factors such as offline context and implicit habits are missing. Second, mechanism duality requires capturing both interpretable preferences and implicit statistical regularities, which no single paradigm achieves alone. We propose Policy-Guided Hybrid Simulation (PGHS), a dual-process framework that mines transferable decision policies from behavioral trajectories and uses them as a shared alignment layer. This layer anchors an LLM-based reasoning branch that prevents over-rationalization and an ML-based fitting branch that absorbs implicit regularities. Group-level predictions from both branches are fused for complementary correction. We deploy PGHS on Meituan with 101 merchants and over 26,000 trajectories. PGHS achieves a group simulation error of 8.80%, improving over the best reasoning-based and fitting-based baselines by 45.8% and 40.9% respectively.