daVinci-LLM-3B

Reddit r/LocalLLaMA / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Read original →

共有:

Key Points

daVinci-LLM-3Bは約30億パラメータのベース言語モデルで、事前学習（pretraining）を透明かつ再現可能な科学的プロセスにすることを目的として公開されています。
最終的な重みだけでなく、学習の軌跡、途中チェックポイント、データ処理の判断、200件以上のアブレーション研究（データ品質・混合設計・学習ダイナミクス・評価妥当性など）も提供されています。
約8Tトークンの2段階カリキュラムを採用しており、前半は多様なWebスケールコーパスでの広範な事前学習、後半は数学・コード推論を強化するQA/推論寄りデータで学習します。
リリースにあたり、モデルのGitHub、論文、学習に用いたデータセット（Hugging Face）へのリンクが提示されています。

- https://huggingface.co/SII-GAIR-NLP/davinci-llm-model

Overview

daVinci-LLM-3B is a 3B-parameter base language model presented in daV inci-LLM: Towards the Science of Pretraining. This project aims to make the pretraining process a transparent and reproducible scientific endeavor.

We release not only the final weights but also training trajectories, intermediate checkpoints, data processing decisions, and 200+ ablation studies covering data quality, mixture design, training dynamics, and evaluation validity.

GitHub: GAIR-NLP/daVinci-LLM
Paper: arXiv:2603.27164
Dataset: davinci-llm-data

The model follows a two-stage curriculum over ~8T tokens:

Stage 1 (6T tokens): broad pretraining over diverse web-scale corpora.
Stage 2 (2T tokens): structured QA and reasoning-heavy data to amplify math and code reasoning.

submitted by /u/Aaaaaaaaaeeeee
[link] [comments]