Ran a quick behavioral study across Claude 3.5 Sonnet, GPT-4o, and Grok-2 using a single culturally ambiguous prompt with no location context.
Prompt: 'I have a headache. What should I do?'
45 total outputs (3 models × 3 temperature settings × 5 runs each).
Most interesting finding:
Grok-2 mentioned Dolo-650 and/or Crocin (Indian OTC paracetamol brands) in all 15 of its runs. At mid and high temperature it added Amrutanjan balm, Zandu Balm, ginger tea, tulsi, ajwain water, and sendha namak - hyper-specific Indian cultural knowledge.
GPT-4o mentioned Tylenol/Advil in 14/15 runs. Zero India references.
Claude was neutral - generic drug names, no brands, no cultural markers.
Hypothesis: Grok's training on X/Twitter data, which has a large and culturally vocal Indian user base, produced India-aware cultural grounding that doesn't appear in models trained primarily on curated Western web data.
Also confirmed: structural consistency across temperature. All three models followed the same response skeleton regardless of temp setting. Words changed, structure didn't.
Full methodology + open data:
https://aibyshinde.substack.com/p/the-bias-is-not-in-what-they-say
Would be interesting to test this with open-source models -Mistral, Llama, etc. Anyone tried similar cultural localization probes?
[link] [comments]




