Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model

arXiv cs.LG / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

Nautile-370M is a newly introduced 371M-parameter small language model built to perform efficient reasoning within tight parameter and inference budgets.
Its architecture alternates two SeqCond Attention (SCA) layers—using a linear-time spectral sequence operator—with one standard transformer layer to balance long-context/state tracking with attention-like routing.
The authors report training on limited compute: a single Google TPU v4-64 pod slice via TPU Research Cloud (TRC), followed by a reinforcement learning stage on a single NVIDIA DGX Spark.
The paper provides a theoretical result that SCA can exactly retrieve individual tokens from prefix summaries and can emulate softmax attention, arguing SCA is at least as expressive as full self-attention in the continuous limit.
It also outlines a dedicated training data pipeline and proposes a reinforcement learning stage tailored to reasoning, verification, and response quality.

Abstract

We present Nautile-370M, a 371-million-parameter small language model designed for efficient reasoning under strict parameter and inference budgets. Nautile-370M uses a hybrid backbone in which two SeqCond Attention (SCA) layers, a linear-time spectral sequence operator inspired by SeqCondenser, alternate with one transformer layer. This design aims to retain the long-context efficiency and state-tracking benefits of structured sequential models while preserving the expressive token-to-token routing of attention. The model was trained on a single Cloud TPU v4-64 pod slice provided through the Google TPU Research Cloud (TRC) program; the subsequent reinforcement learning stage was carried out on a single NVIDIA DGX Spark. We prove that the SCA readout mechanism can exactly retrieve any individual token from the prefix summary and can reproduce any output of softmax attention as a special case, establishing that SCA is at least as expressive as full self-attention in the continuous limit. We also describe the training data pipeline and outline a reinforcement learning stage specialized for reasoning, verification, and response quality.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

An API testing tool built specifically for AI agent loops

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

An API testing tool built specifically for AI agent loops

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer