Please I really need your help on this guys [D]

Reddit r/MachineLearning / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • A student describes solving a machine learning time-series classification task by first achieving a public leaderboard score of 0.85, then finding the exact dataset used in the competition externally and obtaining a perfect 1.00 score.
  • They ask whether it’s possible to recreate the same submission predictions (ID-to-label mapping) using only the provided train and test datasets, without relying on the externally found dataset.
  • The student wants guidance on how to “learn” or reverse-engineer the submission output strictly from the original files, ideally using proper machine learning methods rather than external data.
  • They clarify that for their successful submission they had access to the full feature set (not just IDs and labels), and they’re willing to share train/test or the submission file if needed.
  • The post is essentially a feasibility and methodology question about dataset leakage, reproducibility, and generating identical competition outputs from the given splits.

My teacher gave us a machine learning time series classification problem.

At first, I tried solving it normally and got a public score of 0.85. But then I searched for the dataset used in the competition and managed to find it. Using that dataset, I generated a submission file that scored 1.00.

Now my question is:

Is it possible to recreate the submission file using only the provided train and test datasets, without relying on the external dataset I found?

In other words, I want to understand if there is a way to learn or reverse-engineer how to produce the same submission output (ID → label mapping) using only the original train/test files. I’m not sure if “reverse engineering the submission” is the correct term, but I want to figure out how to get the same result properly using machine learning rather than external data.

Also, I want to clarify that for the submission I made, I actually had access to the full feature set—not just IDs and labels, meaning the other feature of the sub file

I would really appreciate any help or guidance. If needed, I can share the train/test files or the submission file that achieved the 1.00 score.

Thanks in advance!

submitted by /u/Djistino
[link] [comments]