Got a 9B Abliterated Claude-Distilled model running for my local hermes

Reddit r/LocalLLaMA / 3/31/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A user reports successfully running a 9B “Abliterated Claude-Distilled” model despite only having 6GB VRAM locally.
  • The workaround was to fully offload inference to a free Google Colab T4 GPU rather than running the model on-device.
  • They routed the model’s API back to their local CLI using a Cloudflare tunnel to keep the interaction seamless.
  • The post emphasizes the experiment cost as $0 so far, positioning the setup as a practical way to use larger/distilled models with limited hardware.
Got a 9B Abliterated Claude-Distilled model running for my local hermes

My laptop only has 6GB of VRAM, which wasn't enough to run abliterated model for my local AI.

I managed to completely offload the inference to a free Google Colab T4 GPU and route the API straight back to my local CLI terminal using a Cloudflare tunnel.

spent 0$ so far... for a test.

submitted by /u/DjuricX
[link] [comments]

Got a 9B Abliterated Claude-Distilled model running for my local hermes | AI Navigate