[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]

Reddit r/MachineLearning / 5/5/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Read original →

共有:

Key Points

QLoRA（4-bit NF4）を用いて、Qwen2.5-1.5BをCEFRの6段階（A1〜C2）英語熟達度分類タスク向けにマルチクラス微調整しました。
データセットは1,785文で、6レベル×10ドメインにバランスさせ、Groq APIとLlama-3.3-70Bで合成生成しつつ語彙・文法進行・文構造の多様性とCEFR固有の言語パターンを維持する制約を設けています。
学習はモデルパラメータの約0.28%のみを訓練するアダプタ方式で行い、テスト（179サンプル）ではAccuracyとMacro F1がともに84.9%でした。
レベル別ではC2のリコールが60.0%と低く、C1/C2の微妙な境界に起因する誤分類が主要なエラー要因として示されています。
FastAPIでの推論APIとDockerによるデプロイ構成を実装し、公開モデルをtransformersで利用する例も提示されています。

I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4).

The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can be useful for:

adaptive language learning systems,
placement testing,
readability estimation,
educational NLP applications.

Dataset

The dataset contains 1,785 English texts balanced across:

6 CEFR levels,
10 domains/topics.

The samples were synthetically generated using:

Groq API
Llama-3.3-70B

Generation constraints were designed to preserve:

vocabulary complexity,
grammatical progression,
sentence structure variation,
CEFR-specific linguistic patterns.

Training Setup

Base model:

Qwen2.5-1.5B

Fine-tuning method:

QLoRA
4-bit NF4 quantization
LoRA adapters

Only ~0.28% of model parameters were trained.

Results

Held-out test set:

179 samples

Metrics:

Accuracy: 84.9%
Macro F1: 84.9%

Per-level recall:

Level	Recall
A1	96.6%
A2	90.0%
B1	90.0%
B2	86.7%
C1	86.7%
C2	60.0%

Most errors come from C1/C2 confusion, which is expected due to the subtle linguistic boundary between those levels.

Deployment

I also built:

a FastAPI inference API,
Docker deployment setup.

Example Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch model = AutoModelForSequenceClassification.from_pretrained( "yanou16/cefr-english-classifier" ) tokenizer = AutoTokenizer.from_pretrained( "yanou16/cefr-english-classifier" ) text = "Artificial intelligence is transforming many industries." inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) pred = outputs.logits.argmax(dim=-1).item() print(pred)

Feedback is welcome, especially regarding:

evaluation methodology,
synthetic data quality,
improving C2 classification performance,
better benchmarking approaches.

submitted by /u/Professional-Pie6704
[link] [comments]

Black Hat USA

AI Business

Claude Code Skills: A Practical Guide for 2026

Dev.to

The Agentic Gap: Why a SharePoint Expert's Excitement Stopped Me Cold

Dev.to

v0.98.1

anthropic-sdk-python Releases

FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8

Reddit r/LocalLLaMA