Need feedback on my Senior Thesis: An automated MLOps pipeline for AI news classification & summarization [D]

Reddit r/MachineLearning / 4/16/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post describes a senior thesis project building an automated MLOps pipeline that scrapes AI-news articles on a schedule, then classifies them into four categories (Market, Solution & Use Case, Deep Dive, Noise).
For summarization, the system sends relevant articles to the Gemini API to produce concise summaries.
The author asks for feedback on what is missing from the current deployment architecture and how to make the pipeline more production-ready.
Specific areas requested for improvement include best practices and additions such as monitoring, CI/CD, and data validation to improve robustness.
The goal is to “level up” the architecture before the final defense, acknowledging that the current setup is basic and that they lack professional MLOps experience.

Need feedback on my Senior Thesis: An automated MLOps pipeline for AI news classification & summarization [D]

Hi everyone,

I'm currently a senior (4th-year undergrad) working on my graduation thesis. For my project, I decided to build an automated MLOps system that aggregates, classifies, and summarizes AI-related news.

Here’s a quick breakdown of how the system works:

Data Ingestion: The system automatically scrapes news articles at scheduled intervals.
Classification: It categorizes the scraped articles into four labels: Market, Solution & Use Case, Deep Dive, and Noise.
Summarization: It then passes the relevant articles through the Gemini API to generate concise summaries.

I've attached a diagram of my current deployment architecture below.

https://preview.redd.it/wqv7mg1e1kvg1.png?width=2410&format=png&auto=webp&s=60268ce337edaec085297f0b6d4566b0671b7efb

My Ask: To be completely honest, I feel like my current setup is still a bit basic/rudimentary. Since I don't have professional experience in building production MLOps pipelines yet, I'm a bit nervous about presenting this and would really appreciate a reality check from you all.

What am I missing in this architecture?
Are there any best practices, tools, or steps (e.g., monitoring, CI/CD, data validation) I should add to make it more robust?
Any suggestions to level this up before my final defense?

I'm open to any critiques or advice you might have. Thank you so much in advance for your time and help!

submitted by /u/bigcityboys
[link] [comments]