AI Navigate

I fine-tuned Qwen 0.5B for task automation and wanted to share the results.

Reddit r/LocalLLaMA / 3/19/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • Fine-tuned Qwen 0.5B with LoRA on ~1000 task examples to automate natural language tasks by generating execution plans that combine CLI commands and hotkeys.
  • Runs entirely locally on CPU (no GPU or cloud APIs) using GGUF Q4_K_M quantization (~300MB) and inference via llama.cpp.
  • Technical setup includes base model Qwen2-0.5B, 3-10 second inference on typical i3/i5 CPUs, with performance ranges: 3-5s on i5+SSD, 5-10s on i3+SSD, and 30-90s on very old hardware.
  • Main training challenges were data quality (regenerating the dataset 2-3 times), overfitting, EOS token handling, and GGUF conversion requiring BF16 dtype and imatrix quantization for stability.
  • Limitations (v0.1) include requiring full file paths (no smart file search yet), CPU-only inference, and basic execution with no visual understanding; feedback requests cover performance on different hardware, edge cases, and v0.2 feature requests; GitHub link provided.

What it does:

- Takes natural language tasks ("copy logs to backup")

- Detects task type (atomic, repetitive, clarification)

- Generates execution plans (CLI commands + hotkeys)

- Runs entirely locally on CPU (no GPU, no cloud APIs)

Technical details:

- Base: Qwen2-0.5B

- Training: LoRA fine-tuning on ~1000 custom task examples

- Quantization: GGUF Q4_K_M (300MB)

- Inference: llama.cpp (3-10 sec on i3/i5)

Main challenges during training:

  1. Data quality - had to regenerate dataset 2-3 times due to garbage examples

  2. Overfitting - took multiple iterations to get validation loss stable

  3. EOS token handling - model wouldn't stop generating until I fixed tokenizer config

  4. GGUF conversion - needed BF16 dtype + imatrix quantization to get stable outputs

Limitations (v0.1):

- Requires full file paths (no smart file search yet)

- CPU inference only (slower on old hardware)

- Basic execution (no visual understanding)

Performance:

- i5 (2018+) + SSD: 3-5 seconds

- i3 (2015+) + SSD: 5-10 seconds

- Older hardware: 30-90 seconds (tested on Pentium + HDD)

Feedback welcome! Especially interested in:

- Performance on different hardware

- Edge cases that break the model

- Feature requests for v0.2

Links:

- GitHub: https://github.com/ansh0x/ace

Happy to answer questions about the training process or architecture!

submitted by /u/Several-Dream9346
[link] [comments]