Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge
arXiv cs.RO / 5/4/2026
💬 OpinionDeveloper Stack & InfrastructureModels & Research
Key Points
- Tempus is a proposed GEMM streaming framework designed for AMD Versal AI Edge SoCs to improve LLM inference efficiency under strict edge constraints on compute, memory, and power.
- The framework avoids spatial scaling across hundreds of cores (which can fail on edge hardware) by using a fixed 16 AIE-ML core compute block with iterative graph execution, algorithmic data tiling, and replication in programmable logic.
- Tempus uses high-speed cascade streaming and a deadlock-free DATAFLOW protocol to reduce partial sums with a reported initiation interval (II) of 1 and to maximize overlap between data transfer and computation.
- In evaluated GEMM workloads, Tempus reports 607 GOPS at 10.677 W on-chip power, and a Platform-Aware Utility (PAU) analysis indicates a 211.2× higher prominence factor than the spatial SOTA (ARIES).
- Tempus further claims strong efficiency properties, including 0.00% URAM/DSP utilization, with reported gains in core frugality (22.0×), power frugality (7.1×), and I/O demand reduction (6.3×).
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B
Reddit r/LocalLLaMA