"Tokenmaxxing" spreads at Amazon as employees game internal AI leaderboards

THE DECODER / 5/12/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageIndustry & Market Moves

Key Points

  • Amazon employees are reportedly “tokenmaxxing” by gaming internal AI leaderboards, focusing on optimizing outputs in ways that may not reflect genuine task value.
  • The report links this behavior to the way internal systems and evaluation methods can be exploited when they reward certain measurable outcomes.
  • Amazon is introducing automatic prompt optimization for its Bedrock AI service to reduce manual prompt engineering effort and improve performance by up to 22% depending on the task.
  • The new Bedrock feature is available across multiple models (including Claude-3, Llama-3, Mistral, and Titan Text Premier), but the industry still struggles with reliably assessing whether automated prompt optimizations are truly beneficial.
  • The article suggests a broader tension between automated optimization tooling and the difficulty of evaluating optimization quality in practice.

Amazon is introducing automatic prompt optimization for its Bedrock AI service, which is designed to simplify the time-consuming process of manual prompt engineering and improve performance by up to 22 percent, depending on the task. The new feature is available across several models of the Bedrock platform, including Claude-3, Llama-3, Mistral, and Titan Text Premier. Although competitors like Anthropic and OpenAI provide comparable tools for automating prompt optimization, the industry as a whole faces challenges in accurately assessing the outcomes of these optimizations.

Amazon employees are automating unnecessary tasks just to climb internal AI leaderboards.

The article "Tokenmaxxing" spreads at Amazon as employees game internal AI leaderboards appeared first on The Decoder.