I Ran Kotlin HumanEval on 11 Local LLMs. An 8GB Model Beat Several 30B Models

Reddit r/LocalLLaMA / 3/15/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The author ran JetBrains' Kotlin HumanEval on 11 local LLMs, including some that fit on a 16 GB VRAM GPU.
In the results, GPT-OSS 20B achieved pass@1 of 85% and pass@3 of 95%, Qwen3.5-35B-a3b 77% / 86%, EssentialAI RNJ-1 75% / 81% (8.8 GB), Seed-OSS-36B 74% / 81%, and GLM 4.7 Flash 68% / 78%.
GPT-OSS 20B dominates pass@1 despite being a relatively small model (~12 GB), while RNJ-1 at 8.8 GB placed third, beating models two to three times larger.
Qwen improved by 18 points in seven months, signaling rapid progress among local LLMs.

TLDR: I ran JetBrains' Kotlin HumanEval on 11 local models, including some small ones that fit on a 16 GB VRAM GPU. Here are the results.

pass@1 / pass@3:
- GPT-OSS 20B: 85% / 95%
- Qwen3.5-35B-a3b: 77% / 86%
- EssentialAI RNJ-1: 75% / 81% ← 8.8 GB file size
- Seed-OSS-36B: 74% / 81%
- GLM 4.7 Flash: 68% / 78%

A few things I found interesting:

GPT-OSS 20B still dominates at 85% pass@1, despite being one of the smaller models by file size (12 GB)
EssentialAI RNJ-1 at 8.8 GB took third place overall, beating models 2-3x its size
Qwen jumped 18 points in seven months

Happy to answer questions about the setup.

submitted by /u/codeforlyfe
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/15DailyView insight →

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

Ledge.ai

富士通、日本初の防衛テックアクセラレータ開始防衛用マルチAIエージェント開発で共創パートナー募集のサムネイル画像

Ledge.ai

AIに心を持たせる試みについて

note

AIと創作

note

働くライター｜AI×note

note

I Ran Kotlin HumanEval on 11 Local LLMs. An 8GB Model Beat Several 30B Models

Key Points

💡 Insights using this article

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

富士通、日本初の防衛テックアクセラレータ開始防衛用マルチAIエージェント開発で共創パートナー募集のサムネイル画像

AIに心を持たせる試みについて

AIと創作

働くライター｜AI×note

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

💡 Insights using this article

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開 初のライセンス提供 のサムネイル画像

富士通、日本初の防衛テックアクセラレータ開始 防衛用マルチAIエージェント開発で共創パートナー募集のサムネイル画像

AIに心を持たせる試みについて

AIと創作

働くライター｜AI×note

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

富士通、日本初の防衛テックアクセラレータ開始防衛用マルチAIエージェント開発で共創パートナー募集のサムネイル画像