Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models

arXiv cs.LG / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureIndustry & Market MovesModels & Research

共有:

Key Points

The study evaluates speculative decoding using EAGLE3 to accelerate PayPal’s Commerce Agent, leveraging a fine-tuned llama3.1-nemotron-nano-8B-v1 model.
On identical 2xH100 hardware, vLLM-based EAGLE3 is benchmarked against NVIDIA NIM across 40 configurations covering speculative token counts, concurrency (1–32), and sampling temperatures.
With gamma=3, the approach delivers 22–49% higher throughput and 18–33% lower latency while keeping acceptance rate roughly stable at 35.5% across conditions.
Increasing to gamma=5 shows diminishing returns, with acceptance rate dropping to around 25%.
Output quality is reported as preserved by an LLM-as-Judge evaluation, and speculative decoding on a single H100 can match or exceed NIM on two H100s, enabling about 50% GPU cost reduction.

Abstract

We evaluate speculative decoding with EAGLE3 as an inference-time optimization for PayPal's Commerce Agent, powered by a fine-tuned llama3.1-nemotron-nano-8B-v1 model. Building on prior work (NEMO-4-PAYPAL) that reduced latency and cost through domain-specific fine-tuning, we benchmark EAGLE3 via vLLM against NVIDIA NIM on identical 2xH100 hardware across 40 configurations spanning speculative token counts (gamma=3, gamma=5), concurrency levels (1-32), and sampling temperatures (0, 0.5). Key findings: (1) gamma=3 achieves 22-49% throughput improvement and 18-33% latency reduction at zero additional hardware cost; (2) acceptance rates remain stable at approximately 35.5% for gamma=3 across all conditions; (3) gamma=5 yields diminishing returns (approximately 25% acceptance rate); (4) LLM-as-Judge evaluation confirms fully preserved output quality; and (5) speculative decoding on a single H100 matches or exceeds NIM on two H100s, enabling 50% GPU cost reduction.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/23DailyView insight →

Black Hat USA

AI Business

Training ChatGPT on Private Data: A Technical Reference

Dev.to

AI Tutor and Doubt Solver — EaseLearn AI Complete Review 2026

Dev.to

Doubt Solver App Free — Best Camera Based Doubt Solving India 2026

Dev.to

Best AI Tutor App in India 2026 — Free for All Students

Dev.to

Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat USA

Training ChatGPT on Private Data: A Technical Reference

AI Tutor and Doubt Solver — EaseLearn AI Complete Review 2026

Doubt Solver App Free — Best Camera Based Doubt Solving India 2026

Best AI Tutor App in India 2026 — Free for All Students

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer