[D] Training a classifier entirely in SQL (no iterative optimization)

Reddit r/MachineLearning / 3/23/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • SEFR is a lightweight linear classifier implemented entirely in SQL (BigQuery) without iterative optimization.
  • On a 55k fraud detection dataset, SEFR achieves AUC 0.954 compared with Logistic Regression's 0.986.
  • SEFR is about 18× faster due to its fully parallelizable formulation and lack of iterative optimization.
  • The post demonstrates the feasibility of database-native ML by training a classifier entirely in SQL and highlights trade-offs between accuracy and speed.
  • The report is shared via a Reddit submission and links to a Medium article detailing end-to-end ML in BigQuery using only SQL.

I implemented SEFR, which is a lightweight linear classifier, entirely in SQL (in Google BigQuery), and benchmarked it against Logistic Regression.

On a 55k fraud detection dataset, SEFR achieves AUC 0.954 vs. 0.986 of Logistic Regression, but SEFR is ~18× faster due to its fully parallelizable formulation (it has no iterative optimization).

submitted by /u/CriticalofReviewer2
[link] [comments]