Analog Optical Inference on Million-Record Mortgage Data

arXiv cs.LG / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper benchmarks an analog optical computer (AOC) digital twin on mortgage approval classification using 5.84 million U.S. HMDA records, moving beyond prior small image benchmarks.
  • On a 19-feature setup, the AOC achieves 94.6% balanced accuracy versus 97.9% for XGBoost, and widening the optical core from 16 to 48 channels only marginally reduces the gap, pointing to architectural limits rather than hardware alone.
  • When all models are forced into a shared 127-bit binary encoding, accuracy for every approach drops to about 89.4–89.6%, with the encoding overhead costing digital models ~8 percentage points and the AOC ~5 points.
  • The authors find that seven calibrated hardware non-idealities add no measurable penalty, and they attribute remaining accuracy loss to three main layers: encoding, architecture, and hardware fidelity.
  • The study provides a clear roadmap for next improvements by pinpointing where accuracy is lost and which constraints most affect analog optical inference performance.

Abstract

Analog optical computers promise large efficiency gains for machine learning inference, yet no demonstration has moved beyond small-scale image benchmarks. We benchmark the analog optical computer (AOC) digital twin on mortgage approval classification from 5.84 million U.S. HMDA records and separate three sources of accuracy loss. On the original 19 features, the AOC reaches 94.6% balanced accuracy with 5,126 parameters (1,024 optical), compared with 97.9% for XGBoost; the 3.3 percentage-point gap narrows by only 0.5pp when the optical core is widened from 16 to 48 channels, suggesting an architectural rather than hardware limitation. Restricting all models to a shared 127-bit binary encoding drops every model to 89.4--89.6%, with an encoding cost of 8pp for digital models and 5pp for the AOC. Seven calibrated hardware non-idealities impose no measurable penalty. The three resulting layers of limitation (encoding, architecture, hardware fidelity) locate where accuracy is lost and what to improve next.