We added native anomaly detection in Stratum, our columnar analytics engine for the JVM. Train and score isolation forest models entirely from SQL — no Python, no export pipeline:
SELECT * FROM transactions WHERE ANOMALY_SCORE('fraud_model') > 0.7; 6 microseconds per transaction, SIMD-accelerated, runs inside the query engine. The full write-up covers why we built it, how isolation forests work, and benchmarks against PyOD/scikit-learn:
https://datahike.io/notes/anomaly-detection-in-your-database/
Stratum is open source (Apache 2.0): https://github.com/replikativ/stratum
Happy to answer questions about the implementation — the isolation forest is pure Java with Vector API SIMD, scoring is fused into the query execution pipeline so it benefits from zone map pruning and chunked streaming.
[link] [comments]




