Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization
arXiv cs.LG / 3/20/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning, addressing training scale, base-model selection, and architecture-aware quantization.
- Stage 1 demonstrates optimal training scale around 4,000 samples with test-set NLL minimized at 1.127 and overfitting observed at 5,000 samples.
- Stage 2 shows that Llama-3 models with Japanese continual pre-training (Swallow-8B, ELYZA-JP-8B) outperform multilingual models such as Qwen2.5-7B.
- Stage 3 reveals architecture-aware quantization results where Llama-3 architectures improve under Q4_K_M quantization while GQA architectures degrade; production recommendation Swallow-8B Q4_K_M achieves 2.830/3 score, 8.9 s/question, and 4.9 GB size, with applicability to compact Japanese specialist LMs on consumer hardware.
Related Articles
Complete Guide: How To Make Money With Ai
Dev.to
Built a small free iOS app to reduce LLM answer uncertainty with multiple models
Dev.to
Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again
Dev.to
SurfaceDocs + Gemini ADK: Agent Output That Sticks Around
Dev.to
vectordata-dotnet-10.1.0
Semantic Kernel Releases