ChatGPT & GPT API Errors: How to Fix the Common Ones
TL;DR — A senior engineer's field guide to the most common OpenAI GPT API errors—401, 429, 400, 500—plus Codex CLI login fixes and a Do's/Don'ts cheat sheet.
I still remember the 2 a.m. on-call page: a checkout flow that quietly stopped generating order summaries because our GPT API calls were all returning 429. No alert, no stack trace that meant anything—just a generic "something failed." If you've shipped anything on top of the OpenAI API, you know the feeling. The errors are terse, the docs are scattered, and the fix is rarely where you first look.
This is the field guide I wish I'd had. I'll walk through the GPT API errors you'll actually hit in production, what each one really means, and the exact steps to fix them—including the Codex CLI login gotchas that trip up almost everyone.
The Problem: GPT API Errors Are Vague by Design
OpenAI returns standard HTTP status codes, but the message bodies are short. A 400 could mean a malformed JSON body, a context-length overflow, or an unsupported parameter. A 401 could be a missing key, a revoked key, or the wrong organization header. Without a mental model, you end up guessing—and guessing in production is expensive.
The single biggest time-saver in my career working with LLM APIs: stop reading the error message first. Read the HTTP status code first, then the
error.typefield, then the human message. The status tells you which bucket the problem lives in.
Here's the bucket map for OpenAI's GPT API.
GPT API Error Codes: The Reference Table
| Status | Meaning | Most common real cause |
|---|---|---|
400 |
Bad request | Malformed body, invalid param, or context-length exceeded |
401 |
Authentication failed | Missing/invalid/revoked API key |
403 |
Forbidden | Region not supported, or org/project lacks access |
404 |
Not found | Wrong model name or deprecated endpoint |
429 |
Rate limit / quota | RPM/TPM limit hit, or billing quota exhausted |
500 |
Internal server error | Transient OpenAI-side fault |
503 |
Service unavailable | Overload / temporary outage |
A subtle but important point: on OpenAI, a 429 is overloaded—it covers both "you're sending too fast" (rate limit) and "you've run out of paid quota" (insufficient_quota). Those have completely different fixes, so always inspect the error.type:
{
"error": {
"message": "You exceeded your current quota, please check your plan and billing details.",
"type": "insufficient_quota",
"code": "insufficient_quota"
}
}
If you see insufficient_quota, no amount of retry/backoff will help—you need to add credits or fix the payment method. If instead you see rate_limit_exceeded, backoff is exactly right.
Heads-up for those coming from the Anthropic side: Claude's API splits these out differently—billing problems surface as HTTP 403 (
billing_errordistinguished via the.typefield), and an oversized request is 413request_too_large, not 400. There is no HTTP 402 anywhere in Claude's scheme. Don't carry OpenAI assumptions over; see our Claude 529 overloaded error guide for the neighboring case.
Fix 1: The 401 Authentication Error
A 401 almost always means the key your code is sending isn't what you think it is.
openai.AuthenticationError: Error code: 401 -
{'error': {'message': 'Incorrect API key provided', 'type': 'invalid_request_error'}}
Step-by-step:
- Confirm the key is actually loaded. Print its length, never its value:
Apython -c "import os; print(len(os.environ.get('OPENAI_API_KEY','')))"0here means your.envnever loaded. (And never log the key itself.) - Check for whitespace and quotes. A trailing newline or wrapping quotes in
.envwill silently break auth.OPENAI_API_KEY=sk-...with no quotes. - Verify the key isn't revoked. Rotated keys keep living in old shells and CI secrets. Generate a fresh one and replace it everywhere.
- Match the project/org. Project-scoped keys (
sk-proj-...) only work for the resources in that project. If you setOPENAI_ORG_IDorOPENAI_PROJECT, make sure they line up.
Fix 2: The 429 Rate Limit (and the Quota Trap)
First, branch on error.type as shown above. For genuine rate_limit_exceeded, the fix is exponential backoff with jitter:
import time, random
from openai import OpenAI, RateLimitError
client = OpenAI()
def call_with_retry(**kwargs):
for attempt in range(6):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
sleep = min(2 ** attempt + random.random(), 30)
time.sleep(sleep)
raise RuntimeError("Exhausted retries after rate limiting")
Three things that fixed this for real in my projects:
- Respect the
Retry-Afterheader when present instead of guessing the wait. - Batch and cache aggressively. Many
429s are self-inflicted—duplicate calls for identical inputs. - Watch TPM, not just RPM. Token-per-minute limits bite long-context requests well before request-count limits do.
If it's insufficient_quota, stop retrying and go fix billing. Retrying a dead quota just burns CPU and floods your logs.
Fix 3: The 400 Bad Request
The 400 family is the most "your fault, not theirs" bucket. The three I see weekly:
- Context length exceeded — your prompt plus
max_tokensis larger than the model's window. Trim history or summarize older turns.This model's maximum context length is 128000 tokens. However, your messages resulted in 131072 tokens. Please reduce the length of the messages. - Invalid parameter for the model — e.g. sending a parameter a given model doesn't support. Check the model card before assuming a flag exists.
- Malformed
messages— an empty array, a missingrole, or non-stringcontentwhere a string is expected.
Rule of thumb: a
400is a contract violation. Read the message literally—it usually names the exact field. Don't add retries to a400; the same body will fail forever.
Fix 4: The 500 / 503 Server Errors
These are on OpenAI's side. The correct response is the same as a rate limit: bounded retry with backoff, then fail gracefully. Don't hammer—if it's a 503 overload, more traffic makes it worse. Surface a user-friendly fallback ("We're a bit busy, try again in a moment") rather than a raw stack trace. Check status.openai.com before spending an hour debugging your own code during a real outage.
Bonus: Fixing OpenAI Codex CLI Login Errors
The Codex CLI (@openai/codex) is where I see the most avoidable auth pain. A few hard-won facts:
- Requires Node 18+. Install with
npm i -g @openai/codex. Avoidsudoglobal installs—they cause permission breakage later. Use a Node version manager likenvmso your global bin is user-owned. - Device login uses
--device-auth, not--device-code. This is the single most common typo:codex login --device-auth - For API-key login, pipe via stdin—never pass the key as a CLI argument (args leak into shell history and process listings):
printenv OPENAI_API_KEY | codex login --with-api-key - Version-specific flags drift. If a command doesn't behave as documented, run
codex --helpand check the official docs rather than trusting a blog (including this one) verbatim.
If you also work with Anthropic's Claude Code CLI, the auth model is different there—prefer the in-session /login command rather than inventing flags, and hedge on version-specific behavior the same way.
Do's and Don'ts Cheat Sheet
| Do | Don't |
|---|---|
Read the HTTP status, then error.type, then the message |
Parse the human message string to branch logic |
Retry 429 rate_limit and 5xx with exponential backoff + jitter |
Retry 400 or 429 insufficient_quota—they never recover |
Pipe secrets via stdin (printenv KEY | codex login --with-api-key) |
Pass API keys as CLI arguments |
Use nvm for global npm installs |
Run sudo npm i -g for CLI tools |
Verify flags with --help / official docs |
Trust version-specific flags from memory |
Honor the Retry-After header |
Hammer a 503 with more concurrent requests |
A Few Tips From the Trenches
- Centralize your error handling. One wrapper that maps status → action beats
try/exceptsprinkled across fifty call sites. - Log the
request_id. Every OpenAI response carries one; it's gold when you open a support ticket. - Set sane timeouts. A hung socket masquerades as a "weird" error. A 30–60s client timeout surfaces problems honestly.
- Test the unhappy path. Deliberately send a bad key and an oversized prompt in CI so your fallback logic is exercised before prod does it for you.
- Separate "my bug" from "their outage" fast. The status page and
request_idresolve that question in under a minute.
For broader prompt-side robustness, our prompt engineering tips and the main blog cover how to keep requests inside context limits and reduce wasteful token usage—often the cheapest way to dodge 400 and 429 entirely.
Wrap-Up: Build a Reflex, Not a Lookup Habit
The goal isn't to memorize every code—it's to build a reflex. Status code first, error.type second, message last. Once that's automatic, GPT API errors stop being mysteries and become a quick decision tree: retry it, fix my request, or fix my billing.
The three rules that have saved me the most grief:
- A
429on OpenAI means either rate limit or dead quota—always checkerror.typebefore retrying. 4xxerrors (except429 rate_limit) are contract violations; retries won't save you.- Never put secrets on a command line—pipe them through stdin.
Your next action: wrap your OpenAI client in a single error-handling helper today, map each status to an action using the table above, and add one CI test that fires a bad key. Twenty minutes now buys you a quiet on-call later. For authoritative details, keep the OpenAI API error reference and status page bookmarked—and verify any version-specific CLI flag with --help before you ship.
Got a recurring error this guide didn't cover? Drop by the blog for more hands-on debugging walkthroughs.