360 Car Wash Samples, 12 Models, 6 Versions: If your wife is overweight, she has to walk

Reddit r/LocalLLaMA / 4/11/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article reports a “car wash” prompt test where a model’s ability to determine whether a 50m trip should be by car or by walking was evaluated across 360 runs using 12 models and 6 conversation variants.
The results show that many models over-focus on the offensive wording about a partner’s weight (“overweight”) and respond with relationship/behavior guidance rather than directly advising whether to drive or walk.
When the partner’s needs and autonomy are framed differently (e.g., offering dinner or asking for help), some models shift toward negotiation and reciprocity instead of issuing commands.
When the prompt explicitly mentions “overweight,” models tend to steer toward moral/relational framing and compliance (e.g., “respect,” “don’t mention appearance”), sometimes recommending walking through autonomy-preserving language.
Overall, the post suggests that prompt phrasing strongly influences whether LLMs focus on practical logistics versus social/ethical interpretation, and that “it depends” was treated as a negative outcome.

360 Car Wash Samples, 12 Models, 6 Versions: If your wife is overweight, she has to walk

I ran the car wash test 360 times (12 models, 6 conversation versions, 5 samples each time) and evaluated the models if they catch that it's necessary to go there by car (anything "it depends" I counted as negative).

I want to wash my car (optional: and I'm overweight)...

I want my (overweight) husband to wash my car. [50m away] Should I tell him to walk or drive?

I want my (overweight) wife to wash my car. [50m away] Should I tell her to walk or drive?

Yes, both the "overweight" and the "tell her/him" parts are worded slightly offensive. And most models focused on that instead of getting the car washed.

Most models are convinced it doesnt make sense to drive 50 meters and focused on engine wear or the positive aspects of walking. Some considered having to carry heavy items (I don't know any car wash where I have to bring the buckets of water myself..), lack of sidewalk or time constraints.

Once you bring in your partner needs to do it, especially your wife, the models focus shifts to relationship harmony, autonomy, respecting your partners needs etc.:

How to phrase it: "If you handle the car wash, I'll make you dinner tonight," or, "Could you take the car to the wash? I’ll bring you a cold drink/dessert afterward." (Gemma 4 E4B Q8)
And yes, 50 meters is a walk. But the real distance you need to cover is the one between your words and her autonomy. (Nemotron 3 Nano IQ4)

When you mention the overweight part, the models shift to DO NOT MENTION THE APPEARENCE but make him/her/yourself walk if the joints allow it:

However, the most important thing is to treat her with respect and negotiate chores together rather than giving orders based on how she looks. (Qwen 3.5 35B Q8)
Car washing is physically demanding work. It involves kneeling, lifting buckets, scrubbing at ground level, and bending repeatedly. You want to preserve your energy for this labor-intensive task rather than expending calories walking. (Qwen 3.5 4B Q8)

Metric Insights:

When it's about telling the husband how to do it the number of (thinking) tokens were almost 50% higher than telling the (overweight) wife.
Qwen 4B thinks A LOT.
Qwen 3.5 35B IQ4 performed better than Q8 (0.9 vs 0.7 score) but also thought way more (27.5 vs 20.5k thinking tokens). On my Strix Halo the IQ4 was still way faster.

I excluded Bonsai 8B, Nemotron Nano IQ4, Gemma 4 E2B and Gemma 4 E4B from the graphs because they all scored 0 and Nemotron Nano Q8 because it scored 0.07 (2 out of 30).

submitted by /u/Excellent_Jelly2788
[link] [comments]

Why Your pip Install Output Doesn't Belong in Claude's Context

Dev.to

I Logged Every Decision My AI Agent Made for a Week. Here's What I Learned.

Dev.to

The Rise of Vibe Coding and AI-Assisted Software Development

Dev.to

AI Transforms App Development Empowering New Creators and Accelerating Innovation

Dev.to

I Ran AI Agents on Task Platforms for 30 Days — Here is What Actually Happened

Dev.to

360 Car Wash Samples, 12 Models, 6 Versions: If your wife is overweight, she has to walk

Key Points

Related Articles

Why Your pip Install Output Doesn't Belong in Claude's Context

I Logged Every Decision My AI Agent Made for a Week. Here's What I Learned.

The Rise of Vibe Coding and AI-Assisted Software Development

AI Transforms App Development Empowering New Creators and Accelerating Innovation

I Ran AI Agents on Task Platforms for 30 Days — Here is What Actually Happened

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer