Precise Robot Command Understanding Using Grammar-Constrained Large Language Models

arXiv cs.RO / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a grammar-constrained hybrid large language model to translate human instructions into deterministic, robot-executable command structures for industrial human-robot collaboration.
  • It uses a two-stage pipeline: a fine-tuned LLM performs contextual reasoning and parameter inference, then a Structured Language Model plus a grammar-based canonicalizer forces outputs into standardized symbolic action frames.
  • A validation-and-feedback loop parses the generated command against a predefined set of executable actions, automatically prompting the LLM to correct invalid outputs.
  • The approach outputs commands in a valid, robot-readable JSON format, aiming to improve both safety and operational reliability compared with more flexible but less constrained LLM outputs.
  • Experiments on the HuRIC dataset show the hybrid grammar-constrained model achieves higher command validity than baselines including an API-based fine-tuned LLM and a standalone grammar-driven NLU model.

Abstract

Human-robot collaboration in industrial settings requires precise and reliable communication to enhance operational efficiency. While Large Language Models (LLMs) understand general language, they often lack the domain-specific rigidity needed for safe and executable industrial commands. To address this gap, this paper introduces a novel grammar-constrained LLM that integrates a grammar-driven Natural Language Understanding (NLU) system with a fine-tuned LLM, which enables both conversational flexibility and the deterministic precision required in robotics. Our method employs a two-stage process. First, a fine-tuned LLM performs high-level contextual reasoning and parameter inference on natural language inputs. Second, a Structured Language Model (SLM) and a grammar-based canonicalizer constrain the LLM's output, forcing it into a standardized symbolic format composed of valid action frames and command elements. This process guarantees that generated commands are valid and structured in a robot-readable JSON format. A key feature of the proposed model is a validation and feedback loop. A grammar parser validates the output against a predefined list of executable robotic actions. If a command is invalid, the system automatically generates corrective prompts and re-engages the LLM. This iterative self-correction mechanism allows the model to recover from initial interpretation errors to improve system robustness. We evaluate our grammar-constrained hybrid model against two baselines: a fine-tuned API-based LLM and a standalone grammar-driven NLU model. Using the Human Robot Interaction Corpus (HuRIC) dataset, we demonstrate that the hybrid approach achieves superior command validity, which promotes safer and more effective industrial human-robot collaboration.