How Vulnerable Are Edge LLMs?

arXiv cs.CL / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper examines how well query-based knowledge extraction attacks can recover behavior from quantized LLMs running on edge devices under realistic query budgets.
  • It finds that quantization adds noise but does not eliminate the semantic knowledge, enabling substantial behavioral recovery with carefully designed queries.
  • The authors propose CLIQ (Clustered Instruction Querying), a structured query construction method aimed at improving semantic coverage while reducing redundant queries.
  • Experiments on quantized Qwen models (INT8/INT4) show CLIQ outperforms original querying strategies across multiple text similarity/overlap metrics (BERTScore, BLEU, ROUGE) and is more efficient under limited budgets.
  • Overall, the results suggest quantization alone is not an effective security measure against this class of extraction risk in edge-deployed LLMs.

Abstract

Large language models (LLMs) are increasingly deployed on edge devices under strict computation and quantization constraints, yet their security implications remain unclear. We study query-based knowledge extraction from quantized edge-deployed LLMs under realistic query budgets and show that, although quantization introduces noise, it does not remove the underlying semantic knowledge, allowing substantial behavioral recovery through carefully designed queries. To systematically analyze this risk, we propose \textbf{CLIQ} (\textbf{Cl}ustered \textbf{I}nstruction \textbf{Q}uerying), a structured query construction framework that improves semantic coverage while reducing redundancy. Experiments on quantized Qwen models (INT8/INT4) demonstrate that CLIQ consistently outperforms original queries across BERTScore, BLEU, and ROUGE, enabling more efficient extraction under limited budgets. These results indicate that quantization alone does not provide effective protection against query-based extraction, highlighting a previously underexplored security risk in edge-deployed LLMs.