Built an AI tool that cleans datasets, fills missing values, and predicts unknown fields [P]

Reddit r/MachineLearning / 4/14/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Read original →

共有:

Key Points

The post describes a Streamlit-based AI tool for real-world dataset cleanup that fills missing values using machine learning models rather than simple imputation methods.
It can infer/predict missing values for an entire column based on the other available columns (using n-1 inputs).
The tool also includes anomaly detection plus correlation and feature-importance reporting to help users understand data quality and drivers.
Users can review UI screenshots and compare before/after CSV outputs, and download the cleaned dataset produced by the tool.
The author shares the project on GitHub and asks for feedback on the modeling approach and accuracy, inviting suggestions for improvements.

Built an AI tool that cleans datasets, fills missing values, and predicts unknown fields [P]

I built a Streamlit-based AI data analysis tool that:

• Fills missing values using ML models (not just mean/median)

• Predicts any missing column using n-1 inputs

• Detects anomalies

• Shows correlations and feature importance

• Lets you download the updated dataset (Attached images show the UI and before vs after CSV file with a sample CSV available on the GitHub page, as well as an image showing the achieved performance metrics)

I wanted to test how well it works on real-world incomplete datasets.

Would love feedback on:

- model approach

- accuracy issues

- any improvements I should make

GitHub: https://github.com/WALKER00058/ML-data-analysis/tree/main

submitted by /u/walker98417
[link] [comments]