DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation

arXiv cs.AI / 4/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The article introduces DALM (Domain-Algebraic Language Model), which aims to prevent cross-domain knowledge interference common in conventional LLM token generation by using structured generation constrained by a domain algebra.
  • DALM uses a three-phase generation process—resolving domain uncertainty, then relation uncertainty, and finally concept uncertainty—so each stage is guided by explicit algebraic constraints.
  • The approach requires three key components: a domain lattice with computable meet/join/implication operations, a relation typing function for controlled inheritance across domains, and a fiber partition that localizes knowledge within domain-specific subsets.
  • The authors describe a three-phase encoder-decoder architecture where generation is confined to a domain fiber, yielding auditably bounded behavior in open-vocabulary mode and structurally preventing contamination in closed-vocabulary mode.
  • They instantiate DALM using the CDC knowledge representation system and propose training/evaluation with validated domain-annotated crystal libraries to test the domain-indexed multi-perspective answer capability.

Abstract

Large language models compress heterogeneous knowledge into a single parameter space, allowing facts from different domains to interfere during generation. We propose DALM, a Domain-Algebraic Language Model that replaces unconstrained token generation with structured denoising over a domain lattice. DALM follows a three-phase generation path: it first resolves domain uncertainty, then relation uncertainty, and finally concept uncertainty, so each stage operates under explicit algebraic constraints. The framework requires only three ingredients: a lattice of domains with computable meet, join, and implication; a typing function over relations that controls inheritance across domains; and a fiber partition that localizes knowledge to domain-specific subsets. Given these ingredients, DALM yields a three-phase encoder-decoder architecture in which generation is confined to a domain fiber, cross-domain contamination is structurally prevented in closed-vocabulary mode and auditably bounded in open-vocabulary mode, and a single query can produce a domain-indexed multi-perspective answer space. We instantiate the framework with the CDC knowledge representation system and outline training and evaluation on validated domain-annotated crystal libraries. DALM reframes language generation as algebraically constrained structured denoising rather than unconstrained decoding over a flat token space.