Lazy or Efficient? Towards Accessible Eye-Tracking Event Detection Using LLMs

arXiv cs.AI / 4/16/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The paper addresses how eye-tracking gaze event detection is difficult to use outside specialized labs due to heterogeneous data formats and the preprocessing sensitivity of classical detectors like I-VT and I-DT.
  • It proposes a code-free, LLM-driven pipeline that interprets raw eye-tracking files, infers their structure/metadata, and generates executable routines from natural-language prompts.
  • The system applies the generated routines to detect and label fixations and saccades, then returns both results and explanatory reports to the user.
  • Experiments on public benchmarks indicate that the LLM-based approach achieves accuracy comparable to traditional detector workflows while substantially reducing technical overhead.
  • The authors position the framework as an accessibility layer for eye-tracking research, enabling iterative refinement by editing prompts rather than extensive programming changes.

Abstract

Gaze event detection is fundamental to vision science, human-computer interaction, and applied analytics. However, current workflows often require specialized programming knowledge and careful handling of heterogeneous raw data formats. Classical detectors such as I-VT and I-DT are effective but highly sensitive to preprocessing and parameterization, limiting their usability outside specialized laboratories. This work introduces a code-free, large language model (LLM)-driven pipeline that converts natural language instructions into an end-to-end analysis. The system (1) inspects raw eye-tracking files to infer structure and metadata, (2) generates executable routines for data cleaning and detector implementation from concise user prompts, (3) applies the generated detector to label fixations and saccades, and (4) returns results and explanatory reports, and allows users to iteratively optimize their code by editing the prompt. Evaluated on public benchmarks, the approach achieves accuracy comparable to traditional methods while substantially reducing technical overhead. The framework lowers barriers to entry for eye-tracking research, providing a flexible and accessible alternative to code-intensive workflows.