Guys we have to change the pelican test

Reddit r/LocalLLaMA / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post argues that the common “pelican on a bike” SVG test is becoming overused (“benchmaxxed”) and suggests switching to a different test.
The author proposes using a more varied prompt—generating an SVG of “a horse sitting in an F1 race car”—as a new benchmark-style check.
Multiple AI models are reported to have generated outputs for the proposed SVG prompt, including Gemini 3.1 Pro, DeepSeek (expert mode), GLM 5.1, MiniMax 2.7, Kimi K2.5, Claude Sonnet 4.6, and Qwen 3.6 Plus.
The overall thrust is community-driven experimentation to stress-test image/SVG generation capabilities and reduce reliance on a single fixed prompt.
The post implicitly highlights cross-model variability and the usefulness of prompt diversity when evaluating generative performance.

So i have been seeing more of those pelican on a bike svg tests and while they work i feel like (and maybe you guys do too) they are getting kinda benchmaxxed so we should switch things up soon and this is my idea

generate me a html svg of a horse sitting in an f1 race car

Gemini 3.1 Pro gave me this

Gemini 3.1 Pro

and DeepSeek Expert Mode this

DeepSeek Expert (official website)

GLM 5.1 (hosted on unofficial cloud)

GLM 5.1

MiniMax 2.7 (hosted on unoffical cloud)

Minimax M2.7

Kimi K2.5 (dont have access to 2.6 / budget was limited so i used it via offical website)

Kimi K2.5

Claude Sonnet 4.6 (official website and yes probably quantized version)