InCoder-32B-Thinking: Industrial Code World Model for Thinking

arXiv cs.CL / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces InCoder-32B-Thinking, an industrial code “world model” designed to produce expert-like reasoning traces for software tasks spanning chip design, GPU optimization, and embedded systems.
  • It trains on reasoning chains generated by the Error-driven Chain-of-Thought (ECoT) framework, which uses multi-turn dialogue plus environmental error feedback to explicitly model error-correction during reasoning.
  • The industrial code world model (ICWM) is trained on domain execution traces (e.g., Verilog simulation and GPU profiling) to learn causal dynamics between code changes and hardware behavior.
  • The system supports self-verification by predicting execution outcomes before compilation, and the synthesized reasoning traces are validated via domain toolchains to match the reasoning depth seen in real industrial tasks.
  • Reported evaluations across general and industrial benchmarks show strong performance, including 81.3% on LiveCodeBench v5 and 84.0% on CAD-Coder, with results also reported for KernelBench.

Abstract

Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason about hardware constraints and timing semantics. In this work, we propose InCoder-32B-Thinking, trained on the data from the Error-driven Chain-of-Thought (ECoT) synthesis framework with an industrial code world model (ICWM) to generate reasoning traces. Specifically, ECoT generates reasoning chains by synthesizing the thinking content from multi-turn dialogue with environmental error feedback, explicitly modeling the error-correction process. ICWM is trained on domain-specific execution traces from Verilog simulation, GPU profiling, etc., learns the causal dynamics of how code affects hardware behavior, and enables self-verification by predicting execution outcomes before actual compilation. All synthesized reasoning traces are validated through domain toolchains, creating training data matching the natural reasoning depth distribution of industrial tasks. Evaluation on 14 general (81.3% on LiveCodeBench v5) and 9 industrial benchmarks (84.0% in CAD-Coder and 38.0% on KernelBench) shows InCoder-32B-Thinking achieves top-tier open-source results across all domains.GPU Optimization