Qwen3.6-27B-NVFP4 - images

Reddit r/LocalLLaMA / 5/2/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The post shares a successful local inference setup for the Qwen3.6-27B-NVFP4 model using a specific Abiray-Qwen3.6-27B-NVFP4.gguf file.
  • It lists the exact llama.cpp server launch parameters and the hardware/software environment (Legion 7i Gen10 with an RTX 5090, Core Ultra 9 275HX, 32GB RAM), including NVFP4-focused settings.
  • The author provides detailed build steps for llama.cpp with CUDA enabled and NVFP4 turned on, including compilation flags (AVX-512/VNNI, CUDA F16, CUDA graphs) and toolchain versions.
  • A build verification section confirms that NVFP4 tensor-core support (Blackwell FP4) and related backends (GPU and CPU shared libraries) were compiled and activated.
  • Example prompts demonstrate generating SVG images via the configured server, indicating the model’s multimodal/image-generation style usage in this setup.
Qwen3.6-27B-NVFP4 - images

Model: Abiray-Qwen3.6-27B-NVFP4.gguf
Specs:

- Legion 7i Gen10 - NVIDIA GeForce RTX™ 5090

- Intel® Core™ Ultra 9 275HX × 24

- RAM 32.0 GiB

llamacpp settings:

./build/bin/llama-server \ -m ~/.lmstudio/models/lmstudio-community/Qwen3.6-27B-GGUF/Abiray-Qwen3.6-27B-NVFP4.gguf \ -ngl 99 \ -c 131072 \ -t 16 \ -b 4096 \ -ub 2048 \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ -fa 1 \ --defrag-thold 0.1 \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.0 \ --presence-penalty 0.0 \ --repeat-penalty 1.0 \ --metrics \ --host 0.0.0.0 --port 8080 \ -np 2 

My successfull build details:

cmake -B build \ -DGGML_CUDA=ON \ -DCMAKE_CUDA_ARCHITECTURES="120" \ -DCMAKE_BUILD_TYPE=Release \ -DGGML_CUDA_F16=ON \ -DGGML_CUDA_NVFP4=ON \ -DGGML_CUDA_GRAPHS=ON \ -DGGML_CCACHE=OFF \ -DGGML_AVX512=ON \ -DGGML_AVX512_VNNI=ON \ -DLLAMA_CURL=ON \ -DCMAKE_C_COMPILER=/usr/bin/gcc-14 \ -DCMAKE_CXX_COMPILER=/usr/bin/g++-14 \ -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-14 cmake --build build --config Release -j$(nproc) 2>&1 | tee /tmp/build_llamacpp.log 

NVFP4 ✅
mmq-instance-nvfp4.cu.o compiled — Blackwell FP4 tensor cores are active
mmq-instance-mxfp4.cu.o also compiled — MX FP4 format supported too
All key backends built ✅
libggml-cuda.so — GPU backend
libggml-cpu.so — CPU backend with your AVX-512/VNNI flags
libggml-base.so, libllama.so, libmtmd.so — all shared libs
Compiler & CUDA ✅
GCC 14.3.0 used correctly for both C++ and CUDA host
CUDA 13.2.78 toolkit detected and used
Architecture auto-upgraded from 120 → 120a (Blackwell virtual arch — this is correct and better, enables PTX for forward compatibility)

llamacpp version: b8999

Prompts I used from previous post Qwen3.6-27B-Q6_K can also be accessed at: https://www.reddit.com/r/LocalLLaMA/comments/1szp96f/qwen3627bq6_k_images/

- Create svg image of a pelican riding a bicycle
- Create svg image of a capybara wearing a kimono drinking matcha tea
- Create svg image of a flamingo knitting a colorful sweater
- Create svg image of a sushi roll wearing sunglasses driving a go-kart
- Create svg image of a Victorian-era robot reading a newspaper in a cafe
- Create a svg image of a time-lapse composite showing a flower blooming, wilting, and transforming into butterflies across four seasons, all in one frame with seasonal lighting

I pasted the SVGs on black and white backgrounds and picked the most visually appealing.

Conclusion:

- 37 t/s

- lower creativity of the model is visible in the images.

- images are kinda looking kids cartoons, or simple compared to Q6_K(was also not some industry standards but i prefer q6)

submitted by /u/Usual-Carrot6352
[link] [comments]