Caracal: Causal Architecture via Spectral Mixing

arXiv cs.AI / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • Caracal is a new architecture for long-context language modeling that replaces attention with a parameter-efficient Multi-Head Fourier (MHF) module, targeting the quadratic cost of attention and positional encoding limits.
  • The model uses FFT-based sequence mixing to achieve O(L log L) complexity, improving scalability to long sequences.
  • It introduces a frequency-domain causal masking approach (via asymmetric padding and truncation) to preserve autoregressive generation in a Fourier-based setting.
  • Unlike some efficient sequence models that require hardware-specific kernels (e.g., Mamba), Caracal is designed to rely on standard library operators for easier portability.
  • Experiments report competitive performance versus Transformer and SSM baselines, with code made available in the appendix.

Abstract

The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, \mathcal{O}(L \log L) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.

Caracal: Causal Architecture via Spectral Mixing | AI Navigate