A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

Reddit r/LocalLLaMA / 5/4/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The author describes a simple “litmus test” prompt for evaluating LLM reliability by asking it to write a Python program that reads a local C: drive and lists folder/file names sorted by size.
  • They found that locally run models produced incorrect behavior, such as failing executions, double-counting file sizes, and generating overly complex recursive function structures.
  • The proposed test is framed as a practical way to quickly detect common implementation errors in LLM-generated code rather than relying on vague answers.
  • The piece suggests using a separate cloud-based API to verify the code the model generates, improving confidence in correctness.
  • The overall takeaway is that even seemingly straightforward coding requests can expose weaknesses in how LLMs reason about file system traversal and size calculations.

Then ask your cloud FOTM api to verify the code it spit.
I thought it was an easy question, but my local ones just died on it, with wrong executions, double-reading the sizes of files, putting recursive functions inside recursive functions.

I think I got my magic test.

submitted by /u/KptEmreU
[link] [comments]