================================================================
ASTHENOSPHERE NPU INFERENCE METRICS
Hardware:
Device: AMD Phoenix XDNA gen1 (AIE2)
Tiles: 12/12 (complete transformer pipeline)
Device ID: /dev/accel/accel0
Status: ACTIVE
Reliability: 100%
Pipeline:
PreScale > Q proj > RoPE > Attention > O proj > Attn ResAdd
PreScale2 > Gate+SiLU+Up > EltMul > Down > FFN ResAdd > Score Head
14 ops, zero CPU/GPU during NPU compute
SESSION AVERAGES (7 messages)
Avg tokens/msg: 64.7
Avg elapsed/msg: 83ms
Avg eff tok/s: 3866
Avg acceptance: 91.8%
Avg cost/msg: 21.3 Motes
ALL-TIME AVERAGES (7 messages)
Avg tokens/msg: 64.7
Avg elapsed/msg: 83ms
Avg eff tok/s: 3866
Avg acceptance: 91.8%
Avg cost/msg: 21.3 Motes
PER-DISPATCH LOG (7 entries)
Time Tokens Dispatches Elapsed Eff tok/s Accept% Motes
16:31:41 65 12 5.4ms 11970 86% 6
16:31:38 65 12 134ms 485 87% 31
16:31:00 65 12 146.4ms 444 88% 33
16:30:48 65 12 147.6ms 440 90% 33
16:30:05 65 12 12.1ms 5356 93% 9
16:29:56 64 12 127.2ms 503 100% 30
16:29:39 64 12 8.1ms 7866 100% 7
================================================================
GLOSSARY
NPU Neural Processing Unit — dedicated AI accelerator chip
on AMD Ryzen 7000/8000 series (Phoenix XDNA gen1).
Runs inference with zero CPU/GPU usage.
Tile One AIE2 compute core on the NPU. Each has 32KB SRAM.
This pipeline uses all 12 available tiles.
tok/s Tokens per second — inference throughput. A token is
roughly 3/4 of a word. Higher = faster response.
Eff tok/s Effective tokens/second — accounts for speculative
decoding where multiple candidates are evaluated per
dispatch. Higher than raw tok/s when speculation works.
Acceptance% How often speculative candidate tokens are accepted.
Higher = more tokens per dispatch = faster generation.
Dispatch One round-trip to the NPU: host sends data, NPU
processes all 12 pipeline stages, host reads result.
Motes Asthenosphere's internal compute cost unit. Derived
from inference latency, model size, and token count.
Used for resource accounting across the persona economy.
1 Mote ~ 1 output token on a 3B parameter CPU model.
RoPE Rotary Position Embeddings — encodes token position
information so the model knows word order.
SwiGLU Gated activation function used in modern transformers.
Combines gate projection + SiLU activation + up projection.
RMSNorm Root Mean Square Normalization — stabilizes activations
between transformer layers for training/inference quality.
XCLBIN Compiled hardware bitstream loaded onto the NPU.
Contains the tile programs, data routing, and DMA config.
================================================================
Generated: 2026-04-03T21:31:57.479Z
Asthenosphere NPU Pipeline — AMD Phoenix XDNA gen1
State: Debugging; Functions properly, has visual issues with GUI.
Oversight: Model information not included in log. Will Append with new log format soon to show model information.




