Anyone using Tesla P40 for local LLMs (30B models)?

Reddit r/LocalLLaMA / 3/25/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post asks whether anyone is successfully running local LLMs on a Tesla P40 GPU, specifically targeting 30B-parameter models such as Qwen, Mixtral, or Llama variants.
  • The motivation is cost: the P40 is described as much cheaper (around $250) than RTX 3090s, which remain expensive.
  • The author is trying to gauge practical performance, including tokens-per-second throughput and whether the setup is usable for chat and light coding tasks.
  • A key concern is how well the system handles longer context lengths and what performance degradation occurs as context grows.

Hey guys, is anyone here using a Tesla P40 with newer models like Qwen / Mixtral / Llama?

RTX 3090 prices are still very high, while P40 is around $250, so I’m considering it as a budget option.

Trying to understand real-world usability:

  • how many tokens/sec are you getting on 30B models?
  • is it usable for chat + light coding?
  • how bad does it get with longer context?

Thank you!

submitted by /u/ScarredPinguin
[link] [comments]