[P] Building a LLM from scratch with Mary Shelley's "Frankenstein" (on Kaggle)

Reddit r/MachineLearning / 4/8/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The post provides an in-depth tutorial on building an LLM from scratch using Mary Shelley’s “Frankenstein” as the training text on Kaggle.
  • It links to a supporting article that walks through the core steps of creating an LLM, from data preparation through model training.
  • It also includes a ready-to-run notebook on GitHub (train-frankenstein.ipynb) to reproduce the workflow.
  • The framing emphasizes learning-by-implementation, using a public-domain literary dataset to understand how LLM training pipelines work.