AE-LLM: Adaptive Efficiency Optimization for Large Language Models

arXiv cs.LG / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

要点

The paper proposes AE-LLM, a unified framework that automatically selects and combines LLM efficiency techniques (e.g., efficient attention, MoE, parameter-efficient fine-tuning, and quantization) based on the specific deployment scenario.
AE-LLM uses multi-objective optimization to jointly balance accuracy, latency, memory usage, and energy consumption while respecting hardware constraints and task requirements.
It introduces an efficient search algorithm to explore the combinatorial space of efficiency configurations across architecture, fine-tuning, and inference stages, producing Pareto-optimal trade-offs.
Experiments on 15 models (0.5B–70B) across 10 tasks show an average 2.8× improvement in efficiency metrics while keeping accuracy close to baseline (within 1.2%).
The approach also generalizes to vision-language models, delivering similar efficiency gains and positioning the framework as an automated tool for navigating LLM efficiency trade-offs.

Abstract

Large Language Models (LLMs) have achieved remarkable success across diverse applications, yet their deployment remains challenging due to substantial computational costs, memory requirements, and energy consumption. Recent empirical studies have demonstrated that no single efficiency technique is universally optimal; instead, the effectiveness of methods such as efficient attention mechanisms, mixture-of-experts (MoE), parameter-efficient fine-tuning, and quantization varies significantly depending on task characteristics, resource constraints, and model scales. Building upon these insights, we propose AE-LLM, a unified framework that automatically selects and combines optimal efficiency techniques tailored to specific deployment scenarios. Our approach introduces a multi-objective optimization framework that jointly considers accuracy, latency, memory footprint, and energy consumption, while accounting for hardware constraints and task requirements. We develop an efficient search algorithm that explores the combinatorial space of efficiency techniques across architecture, fine-tuning, and inference stages, identifying Pareto-optimal configurations. Extensive experiments across 15 models (0.5B-70B parameters) and 10 diverse tasks demonstrate that AE-LLM achieves an average of

2.8\times

improvement in efficiency metrics while maintaining competitive accuracy (within 1.2\% of baseline), compared to static efficiency configurations. Furthermore, our framework generalizes effectively to vision-language models, achieving similar efficiency gains. Our contributions provide practitioners with an automated tool for navigating the complex trade-off landscape of LLM efficiency optimization.

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日経XTECH

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

日経XTECH

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

Publickey

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

Qiita

なんと397BのAIモデルをiPhoneで動かすことに成功

GIGAZINE

AE-LLM: Adaptive Efficiency Optimization for Large Language Models

要点

Abstract

関連記事

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

なんと397BのAIモデルをiPhoneで動かすことに成功

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer