I tried to benchmark TurboQuant on Android (Snapdragon 7s Gen 3) — here's what actually happened

Reddit r/LocalLLaMA / 3/31/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A Reddit user attempted to benchmark TurboQuant on an Android phone (Snapdragon 7s Gen 3, CPU-only) using a Termux-native workflow and GitHub Actions cross-compilation, but hit multiple build-system/flag issues.
The cross-compile succeeded and the binary ran, yet the resulting build did not include the expected TurboQuant (tq3_0) type registration, indicating the branch wasn’t fully integrated with llama.cpp’s TurboQuant type system.
The post concludes TurboQuant for ARM CPU is not ready on mobile because the available community validations are for CUDA/Apple Metal rather than Android ARM, and the closest ARM reference code isn’t merged.
It notes open upstream PRs (#21088/#21089) and that once merged, TurboQuant should provide a reported ~4.4× KV compression memory win that could materially expand mobile context lengths by reducing OOM risk.
The author points to a public CI workflow that cross-compiles and verifies tq3_0 presence in the produced binary automatically, with plans to publish benchmark results after the upstream merge.

I tried to benchmark TurboQuant on Android (Snapdragon 7s Gen 3) — here's what actually happened

Building a sovereign Android dev stack from a single phone. No PC. Termux-native. When TurboQuant dropped last week I immediately wanted to know: does this work on ARM CPU-only? Nobody had tested it on mobile hardware.

My setup:

Xiaomi Redmi Note 14 Pro+ 5G

Snapdragon 7s Gen 3 (ARMv8-A, 8GB RAM)

Termux native, Android 16

No GPU offload (Adreno 730 rejects Qwen3.5 Hybrid Linear Attention kernels)

What I did:

Built the Aaryan-Kapoor turboquant-tq3_0 branch via GitHub Actions cross-compile (can't build on-device — 8GB RAM, -j2 max). Flags: -march=armv8-a+dotprod+i8mm, CPU-only, no NDK.

5 failed builds. Each one taught me something:

llama-server is not a valid target in this branch

CMAKE_SYSTEM_NAME=Android pulls in NDK clang → POSIX_MADV_WILLNEED undefined

Without CMAKE_SYSTEM_NAME=Linux + SYSTEM_PROCESSOR=aarch64, cmake injects -mavx2 -msse4.2 into an ARM build

The result:

Source: turboquant-tq3_0

TQ3_0: false

Target: aarch64 ARMv8-A+dotprod+i8mm

Build succeeded. Binary runs. But strings finds no tq3_0 type registered in the binary. The branch exists, compiles cleanly, but the GGML type registration for TurboQuant isn't merged into this branch yet as of 2026-03-30.

What this means:

TurboQuant on ARM CPU is not ready. The community implementations (turboquant_plus, TheTom's fork) are validated on Apple Silicon Metal and CUDA. The Aaryan-Kapoor CPU reference implementation is the closest thing to ARM-compatible code, but it's not integrated into llama.cpp's type system yet.

The upstream PR (#21088/#21089) is open. When it lands, the memory win (~4.4x KV compression) would matter enormously for 8GB mobile devices — the difference between 4K and 32K context without OOM.

The CI workflow is public: github.com/weissmann93/neobildOS — .github/workflows/build-llama-tq3.yml. Cross-compiles llama.cpp for ARM64 from any machine, checks for TQ3_0 presence in the binary. When the upstream PR merges, re-run and the check goes green automatically.

Will post benchmark numbers (q8_0 baseline vs TQ3_0 when it lands) as a follow-up.

submitted by /u/NeoLogic_Dev
[link] [comments]

Black Hat Asia

AI Business

Claude Code tokens: what they are and how they're counted

Dev.to

How I Review AI-Generated Pull Requests (A Step-by-Step Checklist)

Dev.to

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Dev.to

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Reddit r/artificial

I tried to benchmark TurboQuant on Android (Snapdragon 7s Gen 3) — here's what actually happened

Key Points

Related Articles

Black Hat Asia

Claude Code tokens: what they are and how they're counted

How I Review AI-Generated Pull Requests (A Step-by-Step Checklist)

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer