KoCo: Conditioning Language Model Pre-training on Knowledge Coordinates

arXiv cs.CL / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes KoCo (Knowledge Coordinate Conditioning), which converts documents into a 3D semantic “knowledge coordinate” and prepends it as a text prefix during LLM pre-training to preserve real-world context.
  • Experiments report improved downstream performance on 10 tasks and about a 30% faster pre-training convergence compared with standard flattened token-sequence pre-training.
  • The method is argued to help models separate stable facts from noise, reducing hallucination by explicitly modeling knowledge structure.
  • The approach is positioned as a relatively simple modification to pre-training pipelines rather than a fundamentally new architecture.

Abstract

Standard Large Language Model (LLM) pre-training typically treats corpora as flattened token sequences, often overlooking the real-world context that humans naturally rely on to contextualize information. To bridge this gap, we introduce Knowledge Coordinate Conditioning (KoCo), a simple method that maps every document into a three-dimensional semantic coordinate. By prepending these coordinates as textual prefixes for pre-training, we aim to equip the model with explicit contextual awareness to learn the documents within the real-world knowledge structure. Experiment results demonstrate that KoCo significantly enhances performance across 10 downstream tasks and accelerates pre-training convergence by approximately 30\%. Furthermore, our analysis indicates that explicitly modeling knowledge coordinates helps the model distinguish stable facts from noise, effectively mitigating hallucination in generated outputs.