[P] Made a dataset but don't know what to do with it

Reddit r/MachineLearning / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The author describes creating a small dataset of major air-crash final reports after failing to find an open-source dataset matching that text-based criteria.
  • They are still refining the cleaning/extraction pipeline and, at that stage, are unsure what downstream use cases the dataset should support.
  • They consider building a RAG system but question what specific benefits or value it would provide for this domain-specific document collection.
  • The post asks others who have worked with similar report datasets for practical guidance on how to structure and apply such data.

This weekend I was looking for a dataset on major air crashes (I like planes) containing the text of their final reports. Surprisingly I was unable to find even a single open source dataset matching this criteria. Anyway I started collecting a few reports and was in the stage of extracting and finalising the cleaning pipeline that I realized that I don't really have a clear idea what to do with this data. Perhaps build a RAG but what benefit would that have? Has anyone worked with such reports?

submitted by /u/AbdullahKhanSherwani
[link] [comments]