Fuel Consumption Prediction: A Comparative Analysis of Machine Learning Paradigms

arXiv cs.LG / 2026/3/24

💬 オピニオンIdeas & Deep AnalysisModels & Research

要点

  • The study analyzes vehicle fuel-consumption drivers using the Motor Trend dataset, arguing that physical design parameters—particularly vehicle weight and engine displacement—primarily govern efficiency.
  • It applies data sanitization, statistical outlier removal, and exploratory data analysis to reduce multicollinearity among powertrain features before modeling.
  • In a comparison of machine learning approaches, SVM Regression achieves the best continuous-prediction performance (R-squared 0.889, RMSE 0.326) and captures non-linear effects between mass and displacement.
  • Logistic Regression performs best for classification tasks, reaching 90.8% accuracy and very high recall (0.957) for identifying low-efficiency vehicles.
  • The results contend that for static physical datasets, well-tuned classical and interpretable models can outperform or validate against black-box deep learning trends.

Abstract

The automotive industry is under growing pressure to reduce its environmental impact, requiring accurate predictive modeling to support sustainable engineering design. This study examines the factors that determine vehicle fuel consumption from the seminal Motor Trend dataset, identifying the governing physical factors of efficiency through rigorous quantitative analysis. Methodologically, the research uses data sanitization, statistical outlier elimination, and in-depth Exploratory Data Analysis (EDA) to curb the occurrence of multicollinearity between powertrain features. A comparative analysis of machine learning paradigms including Multiple Linear Regression, Support Vector Machines (SVM), and Logistic Regression was carried out to assess predictive efficacy. Findings indicate that SVM Regression is most accurate on continuous prediction (R-squared = 0.889, RMSE = 0.326), and is effective in capturing the non-linear relationships between vehicle mass and engine displacement. In parallel, Logistic Regression proved superior for classification (Accuracy = 90.8%) and showed exceptional recall (0.957) when identifying low-efficiency vehicles. These results challenge the current trend toward black-box deep learning architectures for static physical datasets, providing validation of robust performance by interpretable and well-tuned classical models. The research finds that intrinsic vehicle efficiency is fundamentally determined by physical design parameters, weight and displacement, offering a data-driven framework for how manufacturers should focus on lightweighting and engine downsizing to achieve stringent global sustainability goals.