Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
arXiv cs.LG / 3/26/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces RAVEN, a recurrence-aware generative pretraining approach for sequential electronic health record (EHR) data that predicts a patient’s next visit by autoregressively generating tokenized clinical events conditioned on history.
- Using data from over one million individuals, the method adds regularization for recurring events and calls out an evaluation pitfall where repeated event tokens can artificially inflate metrics if new onsets are not distinguished from later occurrences.
- The authors study scaling in a data-constrained, compute-saturated regime and find that increasing model size alone is not effective unless paired with increases in data volume.
- In zero-shot disease incidence forecasting, RAVEN is shown to match or rival fully fine-tuned representation-based Transformer models and to outperform simulation-based next-token approaches.
- Without further parameter updates, RAVEN also demonstrates cross-cohort generalization under lossy clinical code mappings and incomplete feature coverage, suggesting robustness to real-world clinical data variation.
Related Articles
Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.
Mistral AI Blog
Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)
Dev.to
Anyone who has any common sense knows that AI agents in marketing just don’t exist.
Dev.to
How to Use MiMo V2 API for Free in 2026: Complete Guide
Dev.to
The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context
Dev.to