When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction
arXiv cs.LG / 4/2/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies founder success prediction using limited and weak career data signals, noting that labels are rare (9%) and successful vs. failed founders can look highly similar.
- It builds 28 structured, JSON-derived features (e.g., jobs, education, and exits) and combines a deterministic rule layer with XGBoost boosted stumps, outperforming a zero-shot LLM baseline with Val F0.5 = 0.3030.
- A controlled experiment compares using LLM-extracted features from a prose field (Claude Haiku) at 67% and 100% dataset coverage, finding that these LLM features capture some model importance but add no cross-validation signal (delta = -0.05pp).
- The authors attribute the lack of gain to structural information loss: anonymized prose is a lossy re-encoding of the same JSON fields, so it does not introduce genuinely new signal.
- They conclude that observed performance ceilings (CV ≈ 0.25, Val ≈ 0.30) reflect the dataset’s information content rather than model inadequacy, positioning the work as a benchmark diagnostic for what future, richer datasets must include.
Related Articles

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama
Dev.to

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally
Dev.to

Why the same codebase should always produce the same audit score
Dev.to

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to