"Tokenmaxxing" spreads at Amazon as employees game internal AI leaderboards

THE DECODER / 5/12/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageIndustry & Market Moves

共有:

Key Points

Amazon employees are reportedly “tokenmaxxing” by gaming internal AI leaderboards, focusing on optimizing outputs in ways that may not reflect genuine task value.
The report links this behavior to the way internal systems and evaluation methods can be exploited when they reward certain measurable outcomes.
Amazon is introducing automatic prompt optimization for its Bedrock AI service to reduce manual prompt engineering effort and improve performance by up to 22% depending on the task.
The new Bedrock feature is available across multiple models (including Claude-3, Llama-3, Mistral, and Titan Text Premier), but the industry still struggles with reliably assessing whether automated prompt optimizations are truly beneficial.
The article suggests a broader tension between automated optimization tooling and the difficulty of evaluating optimization quality in practice.

Amazon employees are automating unnecessary tasks just to climb internal AI leaderboards.

AI Business

Dev.to

Dev.to

Dev.to

Dev.to