From Code to Prediction: Fine-Tuning LLMs for Neural Network Performance Classification in NNGPT
arXiv cs.CV / 5/6/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper proposes a new LLM fine-tuning task within the NNGPT framework: predicting which of two image classification datasets a neural network architecture will perform better on, rather than evaluating generative artifacts after training.
- It leverages the LEMUR dataset, using standardized PyTorch implementations and reproducible metrics, and tests three prompt strategies from easy (normalized-accuracy baseline) to harder (metadata-only and code-only prompts).
- Fine-tuning DeepSeek-Coder-7B-Instruct with LoRA shows the code-only prompt performs best, reaching 80% peak accuracy over 15 epochs, outperforming the metadata prompt at 70%.
- Per-dataset results indicate metadata helps most when dataset properties are distinctive, while code-only prompts remain more balanced; additional comparison with DeepSeek-Coder1.3B suggests reasoning depends on model capacity.
- Overall, the study finds that fine-tuned LLMs can infer cross-dataset neural-network suitability from architecture source code, implying the code carries more discriminative information than dataset metadata alone.
Related Articles

Black Hat USA
AI Business

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to

SIFS (SIFS Is Fast Search) - local code search for coding agents
Dev.to