| The problem: You're tuning hyperparameters. Each run takes multiple hours. You have a budget of maybe 15–20 trials before you run out of time or compute. Bayesian optimization picks your next config based entirely on the final validation score, it has no idea your model overfit at epoch 3, or that val loss was flat for 20 epochs before diverging. What neuropt does differently: After each trial, it sends the full per-epoch train/val curves (and any other information you want) to an LLM and asks it to reason about what's happening, then suggest the next config. It also auto-detects tunable parameters and layers in PyTorch models, so you don't have to manually define a search space if you don't want to. Works with: PyTorch, XGBoost, scikit-learn Results vs. Optuna + random search (same 15-eval budget) shown below: The idea has academic backing (AgentHPO, CPAL 2025), but there wasn't a clean, usable open-source package. This is my attempt at that.
[GitHub link] | [Docs link] Happy to answer questions! I'm curious what architectures/datasets you'd want to see benchmarked next (: [link] [comments] |
[P] neuropt: LLM-guided hyperparameter optimization that reads your training curves
Reddit r/MachineLearning / 3/21/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- [P] neuropt introduces an LLM-guided approach to hyperparameter optimization where, after each trial, the full per-epoch training/validation curves are sent to an LLM to reason about what's happening and suggest the next config.
- It can auto-detect tunable parameters and search spaces in PyTorch, XGBoost, and scikit-learn, reducing the need for manual search space specification.
- The method is evaluated against Optuna's TPE and random search using the same 15-evaluation budget, with results shown on FashionMNIST (CNN) and Covertype (XGBoost).
- The project has academic backing (AgentHPO, CPAL 2025) and is released as an open-source package, with pip install "neuropt[llm]" and links to GitHub and documentation.
- The author invites community feedback on architectures and datasets to benchmark next, signaling ongoing development and collaboration.




