Tutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking Model

Reddit r/LocalLLaMA / 4/4/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • LM Studio’s “Thinking” (reasoning) toggle typically appears automatically only for models downloaded through LM Studio’s own interface, while it may be hidden for externally sourced GGUF files.
  • The easiest fix is to download reasoning models directly in LM Studio and confirm the green “Thinking” icon is present next to the model name before using it.
  • For external GGUFs, the workaround is to spoof the model identity by adding LM Studio hub cache metadata under `...User\.cache\lm-studio\hub\models\` with a provider folder (lowercase) and a model-specific folder.
  • The workaround requires creating `manifest.json` and `model.yaml` inside the model folder so LM Studio recognizes the model as a reasoning model, enabling the Thinking switch in the chat UI.
Tutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking Model

LM Studio is an exceptional tool for running local LLMs, but it has a specific quirk: the "Thinking" (reasoning) toggle often only appears for models downloaded directly through the LM Studio interface. If you use external GGUFs from providers like Unsloth or Bartowski, this capability is frequently hidden.

Here is how to manually activate the Thinking switch for any reasoning model.

### Method 1: The Native Way (Easiest)

The simplest way to ensure the toggle appears is to download models directly within LM Studio. Before downloading, verify that the **Thinking Icon** (the green brain symbol) is present next to the model's name. If this icon is visible, the toggle will work automatically in your chat window.

### Method 2: The Manual Workaround (For External Models)

If you prefer to manage your own model files or use specific quants from external providers, you must "spoof" the model's identity so LM Studio recognizes it as a reasoning model. This requires creating a metadata registry in the LM Studio cache.

I am providing Gemma-4-31B as an example.

#### 1. Directory Setup

You need to create a folder hierarchy within the LM Studio hub. Navigate to:

`...User\.cache\lm-studio\hub\models\`

https://preview.redd.it/yygd8eyue6tg1.png?width=689&format=png&auto=webp&s=3f328f59b10b9c527ffaafc736b9426f9e97042c

  1. Create a provider folder (e.g., `google`). **Note:** This must be in all lowercase.

  2. Inside that folder, create a model-specific folder (e.g., `gemma-4-31b-q6`).

    * **Full Path Example:** `...\.cache\lm-studio\hub\models\google\gemma-4-31b-q6\`

https://preview.redd.it/dcgomhm3f6tg1.png?width=724&format=png&auto=webp&s=ab143465e01b78c18400b946cf9381286cf606d3

#### 2. Configuration Files

Inside your model folder, you must create two files: `manifest.json` and `model.yaml`.

https://preview.redd.it/l9o0tdv2f6tg1.png?width=738&format=png&auto=webp&s=8057ee17dc8ac1873f37387f0d113d09eb4defd6

https://preview.redd.it/nxtejuyeg6tg1.png?width=671&format=png&auto=webp&s=3b29553fb9b635a445f12b248f55c3a237cff58d

Please note that the most important lines to change are:
- The model (the same as the model folder you created)
- And Model Key (the relative path to the model). The path is where you downloaded you model and the one LM Studio is actually using.

**File 1: `manifest.json`**

Replace `"PATH_TO_MODEL"` with the actual relative path to where your GGUF file is stored. For instance, in my case, I have the models located at Google/(Unsloth)_Gemma-4-31B-it-GGUF-Q6_K_XL, where Google is a subfolder in the model folder.

{ "type": "model", "owner": "google", "name": "gemma-4-31b-q6", "dependencies": [ { "type": "model", "purpose": "baseModel", "modelKeys": [ "PATH_TO_MODEL" ], "sources": [ { "type": "huggingface", "user": "Unsloth", "repo": "gemma-4-31B-it-GGUF" } ] } ], "revision": 1 } 

https://preview.redd.it/1opvhfm7f6tg1.png?width=591&format=png&auto=webp&s=78af2e66da5b7a513eea746fc6b446b66becbd6f

**File 2: `model.yaml`**

This file tells LM Studio how to parse the reasoning tokens (the "thought" blocks). Replace `"PATH_TO_MODEL"` here as well.

# model.yaml defines cross-platform AI model configurations model: google/gemma-4-31b-q6 base: - key: PATH_TO_MODEL sources: - type: huggingface user: Unsloth repo: gemma-4-31B-it-GGUF config: operation: fields: - key: llm.prediction.temperature value: 1.0 - key: llm.prediction.topPSampling value: checked: true value: 0.95 - key: llm.prediction.topKSampling value: 64 - key: llm.prediction.reasoning.parsing value: enabled: true startString: "<thought>" endString: "</thought>" customFields: - key: enableThinking displayName: Enable Thinking description: Controls whether the model will think before replying type: boolean defaultValue: true effects: - type: setJinjaVariable variable: enable_thinking metadataOverrides: domain: llm architectures: - gemma4 compatibilityTypes: - gguf paramsStrings: - 31B minMemoryUsageBytes: 17000000000 contextLengths: - 262144 vision: true reasoning: true trainedForToolUse: true 

https://preview.redd.it/xx4r45xcf6tg1.png?width=742&format=png&auto=webp&s=652c89b6de550c92e34bedee9f540179abc8d405

Configuration Files for GPT-OSS and Qwen 3.5
For OpenAI Models, follow the same steps but use the following manifest and model.yaml as an example:

1- GPT-OSS File 1: manifest.json

{ "type": "model", "owner": "openai", "name": "gpt-oss-120b", "dependencies": [ { "type": "model", "purpose": "baseModel", "modelKeys": [ "lmstudio-community/gpt-oss-120b-GGUF", "lmstudio-community/gpt-oss-120b-mlx-8bit" ], "sources": [ { "type": "huggingface", "user": "lmstudio-community", "repo": "gpt-oss-120b-GGUF" }, { "type": "huggingface", "user": "lmstudio-community", "repo": "gpt-oss-120b-mlx-8bit" } ] } ], "revision": 3 } 

2- GPT-OSS File 2: model.yaml

# model.yaml is an open standard for defining cross-platform, composable AI models # Learn more at https://modelyaml.org model: openai/gpt-oss-120b base: - key: lmstudio-community/gpt-oss-120b-GGUF sources: - type: huggingface user: lmstudio-community repo: gpt-oss-120b-GGUF - key: lmstudio-community/gpt-oss-120b-mlx-8bit sources: - type: huggingface user: lmstudio-community repo: gpt-oss-120b-mlx-8bit customFields: - key: reasoningEffort displayName: Reasoning Effort description: Controls how much reasoning the model should perform. type: select defaultValue: low options: - value: low label: Low - value: medium label: Medium - value: high label: High effects: - type: setJinjaVariable variable: reasoning_effort metadataOverrides: domain: llm architectures: - gpt-oss compatibilityTypes: - gguf - safetensors paramsStrings: - 120B minMemoryUsageBytes: 65000000000 contextLengths: - 131072 vision: false reasoning: true trainedForToolUse: true config: operation: fields: - key: llm.prediction.temperature value: 0.8 - key: llm.prediction.topKSampling value: 40 - key: llm.prediction.topPSampling value: checked: true value: 0.8 - key: llm.prediction.repeatPenalty value: checked: true value: 1.1 - key: llm.prediction.minPSampling value: checked: true value: 0.05 

3- Qwen3.5 File 1: manifest.json

{ "type": "model", "owner": "qwen", "name": "qwen3.5-27b-q8", "dependencies": [ { "type": "model", "purpose": "baseModel", "modelKeys": [ "Qwen/(Unsloth)_Qwen3.5-27B-GGUF-Q8_0" ], "sources": [ { "type": "huggingface", "user": "unsloth", "repo": "Qwen3.5-27B" } ] } ], "revision": 1 } 

4- Qwen3.5 File 2: model.yaml

# model.yaml is an open standard for defining cross-platform, composable AI models # Learn more at https://modelyaml.org model: qwen/qwen3.5-27b-q8 base: - key: Qwen/(Unsloth)_Qwen3.5-27B-GGUF-Q8_0 sources: - type: huggingface user: unsloth repo: Qwen3.5-27B metadataOverrides: domain: llm architectures: - qwen27 compatibilityTypes: - gguf paramsStrings: - 27B minMemoryUsageBytes: 21000000000 contextLengths: - 262144 vision: true reasoning: true trainedForToolUse: true config: operation: fields: - key: llm.prediction.temperature value: 0.8 - key: llm.prediction.topKSampling value: 20 - key: llm.prediction.topPSampling value: checked: true value: 0.95 - key: llm.prediction.minPSampling value: checked: false value: 0 customFields: - key: enableThinking displayName: Enable Thinking description: Controls whether the model will think before replying type: boolean defaultValue: false effects: - type: setJinjaVariable variable: enable_thinking 

I hope this helps.

Let me know if you faced any issues.

P.S. This guide works fine for LM Studio 0.4.9.

submitted by /u/Iory1998
[link] [comments]