GPT-5.3 and GPT-5.4 on OpenClaw: Setup and Configuration...

Dev.to / 4/15/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The article provides a setup and configuration guide for running OpenAI’s GPT-5.3 (Codex) and GPT-5.4 models on OpenClaw, aiming to help operators deploy them efficiently.
  • It describes GPT-5.3 as a coding-focused model launched in February 2026, while GPT-5.4 arrived a month later (March 2026) as a general-purpose flagship with multiple variants for different price/performance needs.
  • GPT-5.4 is highlighted for “computer use” capabilities, including strong results on OSWorld (75%), enabling desktop-level autonomous UI task completion.
  • The piece positions the GPT-5 family as the most comprehensive single-provider model lineup for OpenClaw users, from lightweight classification use cases to full autonomous workflows.
  • It emphasizes practical benefits for building agents that operate web apps, CRMs, and internal tools via UI interactions, leveraging GPT-5.4’s desktop automation strength.

Originally published on Remote OpenClaw.

GPT-5.3 and GPT-5.4 on OpenClaw: Setup and Configuration Guide

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Join the Community

Join 1k+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.

Join the Community →

GPT-5 Family Overview

OpenAI's GPT-5 generation arrived in two waves. GPT-5.3 (internally codenamed Codex) launched in February 2026 as a coding-focused model. GPT-5.4 followed one month later in March 2026 as the full general-purpose flagship, shipping with five distinct variants to cover different price and performance tiers.

For OpenClaw operators, the GPT-5 family represents the most comprehensive model lineup from any single provider. Whether you need a free nano model for simple classification, a mid-tier model for everyday agent tasks, or a maximum-capability model for complex autonomous workflows, there is a GPT-5 variant that fits.

The standout capability across the GPT-5.4 line is computer use. Scoring 75% on OSWorld — a benchmark that tests a model's ability to autonomously operate a computer desktop, click buttons, fill forms, navigate between applications, and complete multi-step UI tasks — GPT-5.4 sets a new bar for desktop automation. For OpenClaw operators building agents that interact with web applications, CRMs, or internal tools, this is a game-changing capability.

GPT-5.3 Codex: The Coding Specialist

GPT-5.3 launched in February 2026 as OpenAI's dedicated coding model. It is not a general-purpose model — it was specifically trained and optimized for software engineering tasks: writing code, reviewing code, debugging, refactoring, and generating tests.

Specification

Value

Model ID

gpt-5.3-codex

Release Date

February 2026

Context Window

400K tokens

Input Pricing

$1.75 per 1M tokens

Output Pricing

$14.00 per 1M tokens

Specialty

Code generation, review, debugging

Modalities

Text + Code

The 400K context window is large enough to ingest entire medium-sized codebases in a single prompt. For OpenClaw operators using coding agents, this means your agent can see the full project structure, dependencies, and related files when making changes — resulting in more accurate patches that account for the broader codebase context.

At $1.75/$14 per million tokens, GPT-5.3 is positioned as a premium coding model. The input pricing is competitive, but the output pricing is notable — $14 per million output tokens is on the expensive side, reflecting the computational cost of extended code reasoning.

GPT-5.4: The General-Purpose Flagship

GPT-5.4 is OpenAI's most capable general-purpose model, launched in March 2026. It features a 1 million token context window — the largest of any commercial API model — and ships in five variants spanning from free to premium.

The headline capability is computer use. GPT-5.4 scored 75% on OSWorld, meaning it can successfully complete three out of four desktop automation tasks: navigating web applications, filling out forms, clicking through multi-step workflows, extracting data from applications, and interacting with system dialogs. This is not theoretical — it works in production through OpenClaw's computer-use integration.

The 1M context window is the other major differentiator. While Claude Opus 4.6 also offers 1M context, GPT-5.4 is the only other model at this scale. For OpenClaw operators processing large documents, entire codebases, or long conversation histories, this eliminates the context truncation problem entirely.

All 5 GPT-5.4 Variants Compared

Variant

Input (per 1M)

Output (per 1M)

Context

Best For

gpt-5.4-nano

Free

Free

128K

Classification, routing, simple tasks

gpt-5.4-mini

$2.50

$10.00

512K

Everyday agent tasks, balanced cost/quality

gpt-5.4

$5.00

$20.00

1M

Complex reasoning, extended thinking

gpt-5.4-turbo

$5.00

$20.00

1M

Low-latency applications, real-time agents

gpt-5.4-max

$10.00

$30.00

1M

Maximum quality, complex autonomous tasks

The nano variant is genuinely free — no credits required, just rate limits. For OpenClaw operators, this is useful as a classifier or router: use nano to analyze incoming tasks and route them to the appropriate specialist model, saving costs on the expensive models.

The turbo variant has the same pricing as standard gpt-5.4 but is optimized for lower latency at the cost of slightly reduced reasoning depth. For agents that need to respond quickly — chat agents, real-time assistants, interactive workflows — turbo is the right choice.

Benchmarks and Performance

Benchmark

GPT-5.3 Codex

GPT-5.4

Context

OSWorld

N/A

75.0%

Best computer-use performance of any model

SWE-bench Verified

81.2%

79.5%

5.3 edges out 5.4 on pure coding tasks

AIME 2024

85.0%

94.1%

5.4 significantly stronger on math

MMLU

86.5%

91.2%

5.4 broader knowledge base

HumanEval

93.8%

92.1%

Both excellent at code generation

The OSWorld score of 75% is the most consequential benchmark for agent operators. OSWorld tests complete computer-use workflows — not just recognizing UI elements, but executing multi-step tasks like "open the spreadsheet, sort column B, filter rows where revenue exceeds $10K, and export as CSV." A 75% success rate means GPT-5.4 can handle most routine desktop automation tasks without human intervention.

GPT-5.3 Codex's SWE-bench score of 81.2% makes it the strongest coding model from OpenAI, edging out even GPT-5.4 on pure software engineering tasks. If your OpenClaw agent primarily writes and reviews code, GPT-5.3 is the better choice despite its narrower capabilities.

Complete Pricing Breakdown

Model

Input (per 1M)

Output (per 1M)

Monthly Cost (100K requests)

GPT-5.3 Codex

$1.75

$14.00

~$1,575

GPT-5.4-nano

Free

Free

$0 (rate-limited)

GPT-5.4-mini

$2.50

$10.00

~$1,250

GPT-5.4

$5.00

$20.00

~$2,500

GPT-5.4-turbo

$5.00

$20.00

~$2,500

GPT-5.4-max

$10.00

$30.00

~$4,000

For comparison: Claude Opus 4.6 runs $5/$25 with 90% caching savings available. DeepSeek V3.2 runs $0.028/$0.10. The GPT-5 family sits at the premium end of the market, justified by capabilities like computer use and the 1M context window that cheaper models do not offer.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Stats: 400K GPT-5.3 Context; 1M GPT-5.4 Context; 75% OSWorld Score; 5 Variants GPT-5.4 Models

Key numbers to know

Setup Method 1: OpenAI API (Direct)

The most straightforward way to connect GPT-5 models to OpenClaw is through the OpenAI API directly.

Step 1: Get an OpenAI API Key

Sign up at platform.openai.com and generate an API key. Add credits to your account based on which variant you plan to use.

Step 2: Configure OpenClaw for GPT-5.3 Codex

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai
  model: gpt-5.3-codex
  api_key: your-openai-api-key
  temperature: 0.4
  max_tokens: 16384

Step 3: Configure OpenClaw for GPT-5.4

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai
  model: gpt-5.4          # or gpt-5.4-mini, gpt-5.4-turbo, gpt-5.4-max, gpt-5.4-nano
  api_key: your-openai-api-key
  temperature: 0.7
  max_tokens: 16384

Step 4: Start OpenClaw

openclaw start

Swap between variants by changing the model field. No other configuration changes are needed — all GPT-5 variants use the same API endpoint and authentication.

Setup Method 2: OpenRouter

OpenRouter lets you access all GPT-5 variants through a single API key, with the added benefit of automatic failover and unified billing across multiple model providers.

Step 1: Get an OpenRouter API Key

Sign up at openrouter.ai and generate an API key.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: openai/gpt-5.4       # or openai/gpt-5.3-codex, openai/gpt-5.4-mini, etc.
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 16384

Step 3: Start OpenClaw

openclaw start

OpenRouter pricing for GPT-5 variants is typically identical to direct OpenAI pricing. The advantage is unified billing if you use multiple model providers and automatic failover during OpenAI outages.

Which Variant to Choose

Here is a decision framework for OpenClaw operators:

  • Pure coding agent: Use GPT-5.3 Codex ($1.75/$14). It beats GPT-5.4 on SWE-bench and costs less. The 400K context window handles most codebases.
  • General-purpose agent on a budget: Use GPT-5.4-mini ($2.50/$10). Best balance of capability and cost for everyday tasks — email processing, document analysis, data extraction.
  • Computer use and desktop automation: Use GPT-5.4 ($5/$20). The 75% OSWorld score is only available on the standard and higher tiers. Mini and nano do not support computer use.
  • Real-time interactive agent: Use GPT-5.4-turbo ($5/$20). Same pricing as standard but optimized for lower latency. Best for chat-based agents where response time matters.
  • Maximum autonomy: Use GPT-5.4-max ($10/$30). The highest reasoning depth for complex multi-step tasks that require extended planning and self-correction.
  • Task routing and classification: Use GPT-5.4-nano (free). Let nano analyze the incoming task and route it to the appropriate specialist model.

A common pattern is to use nano as a router, mini for routine tasks, and standard or max for complex tasks — creating a cost-efficient pipeline where most requests hit the cheapest tier.

Frequently Asked Questions

What is the difference between GPT-5.3 and GPT-5.4?

GPT-5.3 (Codex) launched in February 2026 as OpenAI's coding-specialist model with a 400K context window and pricing at $1.75/$14 per million tokens. GPT-5.4 launched a month later in March 2026 as the general-purpose flagship with a 1M context window, 5 variants ranging from $2.50 to $30 per million output tokens, and 75% on OSWorld. GPT-5.3 is the better choice for pure coding tasks at lower cost; GPT-5.4 is better for complex multi-step agent workflows that require computer use.

Which GPT-5.4 variant should I use with OpenClaw?

For most OpenClaw operators, gpt-5.4-mini ($2.50/$10) offers the best balance of performance and cost. Use gpt-5.4 ($5/$20) for complex agent tasks requiring extended reasoning. Use gpt-5.4-turbo for latency-sensitive applications. Reserve gpt-5.4-max ($10/$30) for maximum-quality tasks where cost is secondary. The free gpt-5.4-nano variant is suitable for simple routing and classification tasks only.

Can I use GPT-5.4 for computer use tasks in OpenClaw?

Yes. GPT-5.4 scored 75% on OSWorld, which measures the ability to control a computer — clicking buttons, filling forms, navigating applications. OpenClaw can leverage this through its computer-use integration, allowing your agent to interact with desktop applications, web browsers, and system tools autonomously.

How does GPT-5 pricing compare to Claude Opus 4.6?

GPT-5.4 standard pricing ($5/$20) is comparable to Claude Opus 4.6 ($5/$25) on input but cheaper on output. GPT-5.4-mini ($2.50/$10) is significantly cheaper than any Opus tier. GPT-5.3 Codex ($1.75/$14) is the cheapest option for coding-specific tasks. Claude Opus 4.6 offers 90% caching savings which can dramatically reduce effective costs for repetitive workflows, so the true comparison depends on your usage pattern.

Further Reading