A Blueprint for Self-Evolving Coding Agents in Vehicle Aerodynamic Drag Prediction

arXiv cs.AI / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that high-fidelity vehicle aerodynamic drag evaluation is often slowed by workflow friction (geometry cleanup, meshing retries, queue contention, reproducibility issues) rather than solver runtime, and proposes a new approach to reduce that friction.
  • It presents a contract-centric, self-evolving coding-agent blueprint that discovers executable surrogate prediction pipelines for drag coefficient (Cd) by treating surrogate discovery as constrained optimization over programs instead of static model instances.
  • The method combines evaluator feedback with population-based “island” evolution using structured mutations (data, model, loss, and split policies) and multi-objective selection that balances ranking quality, stability, and cost.
  • A hard evaluation contract enforces governance requirements—leakage prevention, deterministic replay, multi-seed robustness, and strict resource budgets—before candidates are accepted.
  • Experiments on anonymized evolutionary operators report strong performance (Combined Score 0.9335, sign-accuracy 0.9180) and show adaptive sampling and island migration as key drivers, while a “screen-and-escalate” deployment model routes low-confidence cases to high-fidelity CFD for decision-grade reliability.

Abstract

High-fidelity vehicle drag evaluation is constrained less by solver runtime than by workflow friction: geometry cleanup, meshing retries, queue contention, and reproducibility failures across teams. We present a contract-centric blueprint for self-evolving coding agents that discover executable surrogate pipelines for predicting drag coefficient C_d under industrial constraints. The method formulates surrogate discovery as constrained optimization over programs, not static model instances, and combines Famou-Agent-style evaluator feedback with population-based island evolution, structured mutations (data, model, loss, and split policies), and multi-objective selection balancing ranking quality, stability, and cost. A hard evaluation contract enforces leakage prevention, deterministic replay, multi-seed robustness, and resource budgets before any candidate is admitted. Across eight anonymized evolutionary operators, the best system reaches a Combined Score of 0.9335 with sign-accuracy 0.9180, while trajectory and ablation analyses show that adaptive sampling and island migration are primary drivers of convergence quality. The deployment model is explicitly ``screen-and-escalate'': surrogates provide high-throughput ranking for design exploration, but low-confidence or out-of-distribution cases are automatically escalated to high-fidelity CFD. The resulting contribution is an auditable, reusable workflow for accelerating aerodynamic design iteration while preserving decision-grade reliability, governance traceability, and safety boundaries.