Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

Simon Willison's Blog / 4/17/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The post uses Simon Willison’s “pelican riding a bicycle” benchmark to compare two newly released large language models: Alibaba’s Qwen3.6-35B-A3B and Anthropic’s Claude Opus 4.7.
  • Running a quantized Qwen3.6-35B-A3B model locally on a MacBook Pro via LM Studio produced a more accurate bicycle frame and a more coherent scene than the author’s Claude results.
  • The author notes that Claude Opus 4.7 produced an “entirely wrong shape” for the bicycle frame, and that increasing `thinking_level` to `max` did not significantly improve outcomes.
  • The article provides direct transcripts (gists) and example images from both models to document the differences in generated drawings.
  • Overall, it suggests that for this specific visual reasoning/generation task, Qwen3.6-35B-A3B outperformed Opus 4.7 in the author’s hands-on test setup.
Sponsored by: Teleport — Connect agents to your infra in seconds with Teleport Beams. Built-in identity. Zero secrets. Get early access

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

16th April 2026

For anyone who has been (inadvisably) taking my pelican riding a bicycle benchmark seriously as a robust way to test models, here are pelicans from this morning’s two big model releases—Qwen3.6-35B-A3B from Alibaba and Claude Opus 4.7 from Anthropic.

Here’s the Qwen 3.6 pelican, generated using this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf quantized model by Unsloth, running on my MacBook Pro M5 via LM Studio (and the llm-lmstudio plugin)—transcript here:

The bicycle frame is the correct shape. There are clouds in the sky. The pelican has a dorky looking pouch. A caption on the ground reads Pelican on a Bicycle!

And here’s one I got from Anthropic’s brand new Claude Opus 4.7 (transcript):

The bicycle frame is entirely the wrong shape. No clouds, a yellow sun. The pelican is looking behind itself, and has a less pronounced pouch than I would like.

I’m giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!

I tried Opus a second time passing thinking_level: max. It didn’t do much better (transcript):

The bicycle frame is entirely the wrong shape but in a different way. Lines are more bold. Pelican looks a bit more like a pelican.

I don’t think Qwen are cheating

A lot of people are convinced that the labs train for my stupid benchmark. I don’t think they do, but honestly this result did give me a little glint of suspicion. So I’m burning one of my secret backup tests—here’s what I got from Qwen3.6-35B-A3B and Opus 4.7 for “Generate an SVG of a flamingo riding a unicycle”:

Qwen3.6-35B-A3B
(transcript)
The unicycle spokes are a too long. The pelican has sunglasses, a bowtie and appears to be smoking a cigarette. It has two heart emoji surrounding the caption Flamingo on a Unicycle. It has a lot of charisma.
Opus 4.7
(transcript)
The unicycle has a black wheel. The flamingo is a competent if slightly dull vector illustration of a flamingo. It has no flair.

I’m giving this one to Qwen too, partly for the excellent <!-- Sunglasses on flamingo! --> SVG comment.

What can we learn from this?

The pelican benchmark has always been meant as a joke—it’s mainly a statement on how obtuse and absurd the task of comparing these models is.

The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those first pelicans from October 2024 were junk. The more recent entries have generally been much, much better—to the point that Gemini 3.1 Pro produces illustrations you could actually use somewhere, provided you had a pressing need to illustrate a pelican riding a bicycle.

Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21GB quantized version of their latest model is more powerful or useful than Anthropic’s latest proprietary release.

If the thing you need is an SVG illustration of a pelican riding a bicycle though, right now Qwen3.6-35B-A3B running on a laptop is a better bet than Opus 4.7!

Posted 16th April 2026 at 5:16 pm · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

This is Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 by Simon Willison, posted on 16th April 2026.

ai 1963 generative-ai 1742 local-llms 154 llms 1709 anthropic 270 claude 267 qwen 54 pelican-riding-a-bicycle 105 llm-release 191 lm-studio 19

Previous: Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe