ML for UFC predictions: logistic regression vs random forest? [P]

Reddit r/MachineLearning / 5/13/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The author is building a Python (pandas + scikit-learn) UFC fight prediction project using logistic regression with binary outcomes and features like striking accuracy, takedown averages, reach, height, and age.
  • They’ve observed the model’s betting-relevant behavior may over-simplify matchups by repeatedly favoring the highest predicted-probability fighters, prompting questions about how raw probabilities relate to expected value (EV).
  • Noting MMA statistics are highly nonlinear and involve threshold effects and interaction effects between traits (e.g., age only mattering past a point, takedown stats depending on opponent style), they consider whether tree-based models could capture those patterns better.
  • They are specifically wondering whether switching from logistic regression to random forests (or another tree-based approach) would improve the model’s ability to learn feature interactions, while also acknowledging they are still learning how random forests work.
  • The post is framed as a beginner-friendly request for guidance and feedback to improve an exploratory betting-focused modeling project, including round-robin betting strategy implications.

Hello everyone, I am pretty new to anything ML related so bear with me.

I’ve been working on a UFC fight prediction project in Python using pandas + scikit-learn. Right now I’m using logistic regression since the output is binary (fighter A wins or fighter B wins). I’m currently using features like striking accuracy, takedown averages, reach, height, and age from historical UFC data, then generating predicted probabilities for fights and parlays. I'm interested in pushing this project to assist with round robin betting.

One thing I’ve noticed is that the model tends to favor simply stacking the highest-probability fighters, which made me start thinking more about the difference between raw probability and actual betting value/EV. I also already knew that MMA stats are very nonlinear. For example, age might barely matter until a certain threshold, takedown stats may matter much more depending on matchup style, and certain combinations of traits seem more important than the individual stats themselves.

Because of that, I’m wondering whether random forests (or another tree-based model) would make more sense than logistic regression for capturing those interactions. I'm still trying to fully grasp how random forests work, so this might not apply though? Anyway I'm just trying to have fun with this project and I’d genuinely appreciate input from anyone.

submitted by /u/xoVinny-
[link] [comments]