A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

Reddit r/LocalLLaMA / 5/4/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The author describes a simple “litmus test” prompt for evaluating LLM reliability by asking it to write a Python program that reads a local C: drive and lists folder/file names sorted by size.
They found that locally run models produced incorrect behavior, such as failing executions, double-counting file sizes, and generating overly complex recursive function structures.
The proposed test is framed as a practical way to quickly detect common implementation errors in LLM-generated code rather than relying on vague answers.
The piece suggests using a separate cloud-based API to verify the code the model generates, improving confidence in correctness.
The overall takeaway is that even seemingly straightforward coding requests can expose weaknesses in how LLMs reason about file system traversal and size calculations.

Then ask your cloud FOTM api to verify the code it spit.
I thought it was an easy question, but my local ones just died on it, with wrong executions, double-reading the sizes of files, putting recursive functions inside recursive functions.

I think I got my magic test.

submitted by /u/KptEmreU
[link] [comments]

Black Hat USA

AI Business

ALM on Power Platform: ADO + GitHub, the best of both worlds

Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Dev.to

Find 12 high-volume, low-competition GEO content topics Topify.ai should rank on

Dev.to

A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

Key Points

Related Articles

Black Hat USA

ALM on Power Platform: ADO + GitHub, the best of both worlds

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Find 12 high-volume, low-competition GEO content topics Topify.ai should rank on

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer