I fine-tuned a 14B model that outperforms Claude Opus 4.6 on Ada code generation

Reddit r/LocalLLaMA / 3/14/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

A 14B model named Steelman R5 was fine-tuned with QLoRA on a compiler-verified Ada/SPARK dataset (3,430 instruction pairs) and achieves the first published Ada pass@1 results on HumanEval for open models.
In a custom Ada Compilation Benchmark, Steelman R5 reached a 68.6% compile rate, outperforming Claude Opus 4.6 (42.1%) and Claude Sonnet 4.6 (37.2%).
On MultiPL-E HumanEval-Ada (157 problems, pass@1), Steelman R5 achieved 47.1% pass@1 with 74.5% compile rate, higher than the base Qwen2.5-Coder-14B (34.4% pass@1, 51.0% compile rate).
Training details include 4-bit QLoRA with Unsloth and TRL SFTTrainer, LoRA rank 32, five rounds (R2 discarded due to catastrophic forgetting), 1 epoch per round, lr 2e-5, ~49 minutes per round on an H100; the model runs in 12GB VRAM with Q4_K_M and can be tried via the provided Ollama command.

Ada is the language behind flight controllers, missile guidance, satellite systems, and air traffic control. It's one of the most important languages in safety-critical software — and every major LLM i tested is subpar at it.

I fine-tuned Qwen2.5-Coder-14B-Instruct using QLoRA on a compiler-verified dataset of 3,430 Ada/SPARK instruction pairs. Every single training example passes gnatmake -gnat2022 -gnatwa. The model never trains on broken code.

Custom Ada Compilation Benchmark (1,000 prompts, first-attempt clean compile):

Model	Size	Compile Rate
Steelman R5	14B	68.6%
Claude Opus 4.6	—	42.1%
Claude Sonnet 4.6	—	37.2%
Qwen2.5-Coder-14B (base, untuned)	14B	~35%
Claude Sonnet 4	—	27.5%

MultiPL-E HumanEval-Ada (157 problems, pass@1):

Model	Pass@1	Compile Rate
Steelman R5	47.1%	74.5%
Qwen2.5-Coder-14B (base)	34.4%	51.0%

These are the first published Ada pass@1 results on HumanEval for any open model.

Training details:

QLoRA 4-bit via Unsloth + TRL SFTTrainer
LoRA rank 32, alpha 64, targeting q/k/v/o/gate/up/down projections
Full retrain from base each round on accumulated dataset (adapter continuation caused catastrophic forgetting at R2)
1 epoch, lr 2e-5, constant schedule, ~49 minutes per round on a rented H100
Five rounds (R1–R5), with R2 discarded due to catastrophic forgetting from adapter continuation. Project so far has taken about 2-3 days.
Dataset includes standard generation, spec-to-body, error-fix, and multi-file tasks
Named after the 1978 DoD Steelman requirements that defined the Ada language

Try it right now:

ollama run hf.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF

Fits in 12GB VRAM with Q4_K_M.

Links:

Limitations:

Compilation ≠ correctness. 68.6% compiles, 47.1% actually produces correct output on HumanEval.
Error-fix capability is weak (5.1%). Don't expect it to debug your Ada code.
SPARK contracts compile but aren't verified with gnatprove.
Synthetically generated training data — no human Ada developers wrote these examples.
14B model. It will miss things a bigger model would catch.

submitted by /u/clanker-lover
[link] [comments]

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表人間・マシン・AIの資格情報を一元統制のサムネイル画像

Ledge.ai

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

note

報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測

note

「お金、見直したいけどどこから？」AIが改善ヒントを教えてくれる、公式プロンプトを公開

note

Copilotと物語を作ってみた #213 めーっちゃボロボロこぼす女の子の物語

note

I fine-tuned a 14B model that outperforms Claude Opus 4.6 on Ada code generation

Key Points

Related Articles

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表人間・マシン・AIの資格情報を一元統制のサムネイル画像

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測

「お金、見直したいけどどこから？」AIが改善ヒントを教えてくれる、公式プロンプトを公開

Copilotと物語を作ってみた #213 めーっちゃボロボロこぼす女の子の物語

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表 人間・マシン・AIの資格情報を一元統制のサムネイル画像

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

​報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測

「お金、見直したいけどどこから？」AIが改善ヒントを教えてくれる、公式プロンプトを公開

Copilotと物語を作ってみた #213 めーっちゃボロボロこぼす女の子の物語

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表人間・マシン・AIの資格情報を一元統制のサムネイル画像

報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測