I fine-tuned Qwen 0.5B for task automation and wanted to share the results.

Reddit r/LocalLLaMA / 3/19/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

Fine-tuned Qwen 0.5B with LoRA on ~1000 task examples to automate natural language tasks by generating execution plans that combine CLI commands and hotkeys.
Runs entirely locally on CPU (no GPU or cloud APIs) using GGUF Q4_K_M quantization (~300MB) and inference via llama.cpp.
Technical setup includes base model Qwen2-0.5B, 3-10 second inference on typical i3/i5 CPUs, with performance ranges: 3-5s on i5+SSD, 5-10s on i3+SSD, and 30-90s on very old hardware.
Main training challenges were data quality (regenerating the dataset 2-3 times), overfitting, EOS token handling, and GGUF conversion requiring BF16 dtype and imatrix quantization for stability.
Limitations (v0.1) include requiring full file paths (no smart file search yet), CPU-only inference, and basic execution with no visual understanding; feedback requests cover performance on different hardware, edge cases, and v0.2 feature requests; GitHub link provided.

What it does:

- Takes natural language tasks ("copy logs to backup")

- Detects task type (atomic, repetitive, clarification)

- Generates execution plans (CLI commands + hotkeys)

- Runs entirely locally on CPU (no GPU, no cloud APIs)

Technical details:

- Base: Qwen2-0.5B

- Training: LoRA fine-tuning on ~1000 custom task examples

- Quantization: GGUF Q4_K_M (300MB)

- Inference: llama.cpp (3-10 sec on i3/i5)

Main challenges during training:

Data quality - had to regenerate dataset 2-3 times due to garbage examples
Overfitting - took multiple iterations to get validation loss stable
EOS token handling - model wouldn't stop generating until I fixed tokenizer config
GGUF conversion - needed BF16 dtype + imatrix quantization to get stable outputs

Limitations (v0.1):

- Requires full file paths (no smart file search yet)

- CPU inference only (slower on old hardware)

- Basic execution (no visual understanding)

Performance:

- i5 (2018+) + SSD: 3-5 seconds

- i3 (2015+) + SSD: 5-10 seconds

- Older hardware: 30-90 seconds (tested on Pentium + HDD)

Feedback welcome! Especially interested in:

- Performance on different hardware

- Edge cases that break the model

- Feature requests for v0.2