Anyone using Tesla P40 for local LLMs (30B models)?

Reddit r/LocalLLaMA / 3/25/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The post asks whether anyone is successfully running local LLMs on a Tesla P40 GPU, specifically targeting 30B-parameter models such as Qwen, Mixtral, or Llama variants.
The motivation is cost: the P40 is described as much cheaper (around $250) than RTX 3090s, which remain expensive.
The author is trying to gauge practical performance, including tokens-per-second throughput and whether the setup is usable for chat and light coding tasks.
A key concern is how well the system handles longer context lengths and what performance degradation occurs as context grows.

Hey guys, is anyone here using a Tesla P40 with newer models like Qwen / Mixtral / Llama?

RTX 3090 prices are still very high, while P40 is around $250, so I’m considering it as a budget option.

Trying to understand real-world usability:

Thank you!

AI Business

Dev.to

Dev.to

Dev.to

Dev.to