Temporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN
arXiv cs.CL / 3/18/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper reproduces experiments from DYNAMICQA and MULAN and compares their conclusions about external context on temporal facts in LLMs.
- It standardizes both datasets and uses synthetic natural-language contexts to enable direct cross-benchmark comparisons.
- The findings show strong dataset dependence, with MULAN's conclusions generalizing under both frameworks, while applying MULAN to DYNAMICQA yields mixed results.
- It extends replication to LLMs larger than 7B, demonstrating that model size affects how temporal facts are encoded and updated.
- The work emphasizes how dataset design, evaluation metrics, and model scale shape LLM behavior for resolving temporal knowledge conflicts, informing future benchmarking.




