Got a 9B Abliterated Claude-Distilled model running for my local hermes

Reddit r/LocalLLaMA / 3/31/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A user reports successfully running a 9B “Abliterated Claude-Distilled” model despite only having 6GB VRAM locally.
The workaround was to fully offload inference to a free Google Colab T4 GPU rather than running the model on-device.
They routed the model’s API back to their local CLI using a Cloudflare tunnel to keep the interaction seamless.
The post emphasizes the experiment cost as $0 so far, positioning the setup as a practical way to use larger/distilled models with limited hardware.

Got a 9B Abliterated Claude-Distilled model running for my local hermes

My laptop only has 6GB of VRAM, which wasn't enough to run abliterated model for my local AI.

I managed to completely offload the inference to a free Google Colab T4 GPU and route the API straight back to my local CLI terminal using a Cloudflare tunnel.

spent 0$ so far... for a test.

submitted by /u/DjuricX
[link] [comments]