PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed

Reddit r/LocalLLaMA / 3/25/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

Model servers like Ollama, vLLM, TGI, and similar systems can gradually increase RSS over hours due to glibc heap fragmentation and not returning memory to the OS, leading to OOM kills.
The proposed mitigation is to set two environment variables before process startup: `MALLOC_MMAP_THRESHOLD_=65536` and `MALLOC_TRIM_THRESHOLD_=65536`.
The post reports that testing on 13 diffusion models cycling continuously resulted in stable memory usage (~1.2GB) indefinitely, versus OOM at 52GB after 17 hours before the change.
A benchmark repo and full data/script are provided to reproduce and validate the memory behavior and fix.
This is an operational RAM-stability tweak for AI inference/service deployments rather than a change to model architecture or frameworks themselves.

If you run Ollama, vLLM, TGI, or any custom model server that loads and unloads models, you've probably seen RSS creep up over hours until Linux kills the process.

It's not a Python leak. It's not PyTorch. It's glibc's heap allocator fragmenting and never returning pages to the OS.

Fix:

export MALLOC_MMAP_THRESHOLD_=65536

tsumexport MALLOC_TRIM_THRESHOLD_=65536

Set these before your process starts. That's it.

We tested this on 13 diffusion models cycling continuously. Before: OOM at 52GB after 17 hours. After: stable at ~1.2GB indefinitely.

Repo with full data + benchmark script: https://github.com/brjen/pytorch-memory-fix

submitted by /u/VikingDane73
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/25DailyView insight →

MCP Is Quietly Replacing APIs — And Most Developers Haven't Noticed Yet

Dev.to

Stop Guessing Your API Costs: Track LLM Tokens in Real Time

Dev.to

Your AI Agent Is Not Broken. Your Runtime Is

Dev.to

Building an AI-Powered Social Media Content Generator - A Developer's Guide

Dev.to

I Built a Self-Healing AI Trading Bot That Learns From Every Failure

Dev.to

PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed

Key Points

💡 Insights using this article

Related Articles

MCP Is Quietly Replacing APIs — And Most Developers Haven't Noticed Yet

Stop Guessing Your API Costs: Track LLM Tokens in Real Time

Your AI Agent Is Not Broken. Your Runtime Is

Building an AI-Powered Social Media Content Generator - A Developer's Guide

I Built a Self-Healing AI Trading Bot That Learns From Every Failure

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer