Entropy, Disagreement, and the Limits of Foundation Models in Genomics
arXiv cs.LG / 4/7/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that entropy is a core reason foundation models in genomics have had mixed results versus natural-language models.
- By training ensembles on DNA and text, the authors show that high genomic sequence entropy leads to near-uniform next-token outputs, strong disagreement between models, and unstable static embeddings.
- The analysis using empirical Fisher information flow suggests DNA-trained models concentrate Fisher information in embedding layers rather than capturing inter-token relationships.
- The findings imply that self-supervised pretraining from sequences alone may not transfer well to genomic data, challenging assumptions used in current genomic foundation-model training approaches.
Related Articles

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to

Moving from proof of concept to production: what we learned with Nometria
Dev.to

Frontend Engineers Are Becoming AI Trainers
Dev.to