ChatGPT & GPT API Errors: How to Fix the Common Ones

Prompt Architect · 2026-06-17 · 8 min

TL;DR — A senior engineer's field guide to the most common OpenAI GPT API errors—401, 429, 400, 500—plus Codex CLI login fixes and a Do's/Don'ts cheat sheet.

developer debugging API errors on a laptop

I still remember the 2 a.m. on-call page: a checkout flow that quietly stopped generating order summaries because our GPT API calls were all returning 429. No alert, no stack trace that meant anything—just a generic "something failed." If you've shipped anything on top of the OpenAI API, you know the feeling. The errors are terse, the docs are scattered, and the fix is rarely where you first look.

This is the field guide I wish I'd had. I'll walk through the GPT API errors you'll actually hit in production, what each one really means, and the exact steps to fix them—including the Codex CLI login gotchas that trip up almost everyone.

The Problem: GPT API Errors Are Vague by Design

OpenAI returns standard HTTP status codes, but the message bodies are short. A 400 could mean a malformed JSON body, a context-length overflow, or an unsupported parameter. A 401 could be a missing key, a revoked key, or the wrong organization header. Without a mental model, you end up guessing—and guessing in production is expensive.

The single biggest time-saver in my career working with LLM APIs: stop reading the error message first. Read the HTTP status code first, then the error.type field, then the human message. The status tells you which bucket the problem lives in.

Here's the bucket map for OpenAI's GPT API.

GPT API Error Codes: The Reference Table

Status	Meaning	Most common real cause
`400`	Bad request	Malformed body, invalid param, or context-length exceeded
`401`	Authentication failed	Missing/invalid/revoked API key
`403`	Forbidden	Region not supported, or org/project lacks access
`404`	Not found	Wrong model name or deprecated endpoint
`429`	Rate limit / quota	RPM/TPM limit hit, or billing quota exhausted
`500`	Internal server error	Transient OpenAI-side fault
`503`	Service unavailable	Overload / temporary outage

A subtle but important point: on OpenAI, a 429 is overloaded—it covers both "you're sending too fast" (rate limit) and "you've run out of paid quota" (insufficient_quota). Those have completely different fixes, so always inspect the error.type:

{
  "error": {
    "message": "You exceeded your current quota, please check your plan and billing details.",
    "type": "insufficient_quota",
    "code": "insufficient_quota"
  }
}

If you see insufficient_quota, no amount of retry/backoff will help—you need to add credits or fix the payment method. If instead you see rate_limit_exceeded, backoff is exactly right.

Heads-up for those coming from the Anthropic side: Claude's API splits these out differently—billing problems surface as HTTP 403 (billing_error distinguished via the .type field), and an oversized request is 413 request_too_large, not 400. There is no HTTP 402 anywhere in Claude's scheme. Don't carry OpenAI assumptions over; see our Claude 529 overloaded error guide for the neighboring case.

Fix 1: The 401 Authentication Error

padlock and security key on a keyboard

A 401 almost always means the key your code is sending isn't what you think it is.

openai.AuthenticationError: Error code: 401 -
{'error': {'message': 'Incorrect API key provided', 'type': 'invalid_request_error'}}

Step-by-step:

Confirm the key is actually loaded. Print its length, never its value:
```
python -c "import os; print(len(os.environ.get('OPENAI_API_KEY','')))"
```
A 0 here means your .env never loaded. (And never log the key itself.)
Check for whitespace and quotes. A trailing newline or wrapping quotes in .env will silently break auth. OPENAI_API_KEY=sk-... with no quotes.
Verify the key isn't revoked. Rotated keys keep living in old shells and CI secrets. Generate a fresh one and replace it everywhere.
Match the project/org. Project-scoped keys (sk-proj-...) only work for the resources in that project. If you set OPENAI_ORG_ID or OPENAI_PROJECT, make sure they line up.

Fix 2: The 429 Rate Limit (and the Quota Trap)

First, branch on error.type as shown above. For genuine rate_limit_exceeded, the fix is exponential backoff with jitter:

import time, random
from openai import OpenAI, RateLimitError

client = OpenAI()

def call_with_retry(**kwargs):
    for attempt in range(6):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            sleep = min(2 ** attempt + random.random(), 30)
            time.sleep(sleep)
    raise RuntimeError("Exhausted retries after rate limiting")

Three things that fixed this for real in my projects:

Respect the Retry-After header when present instead of guessing the wait.
Batch and cache aggressively. Many 429s are self-inflicted—duplicate calls for identical inputs.
Watch TPM, not just RPM. Token-per-minute limits bite long-context requests well before request-count limits do.

If it's insufficient_quota, stop retrying and go fix billing. Retrying a dead quota just burns CPU and floods your logs.

Fix 3: The 400 Bad Request

The 400 family is the most "your fault, not theirs" bucket. The three I see weekly:

Context length exceeded — your prompt plus max_tokens is larger than the model's window. Trim history or summarize older turns.

This model's maximum context length is 128000 tokens. However, your messages
resulted in 131072 tokens. Please reduce the length of the messages.

Invalid parameter for the model — e.g. sending a parameter a given model doesn't support. Check the model card before assuming a flag exists.
Malformed messages — an empty array, a missing role, or non-string content where a string is expected.

Rule of thumb: a 400 is a contract violation. Read the message literally—it usually names the exact field. Don't add retries to a 400; the same body will fail forever.

Fix 4: The 500 / 503 Server Errors

These are on OpenAI's side. The correct response is the same as a rate limit: bounded retry with backoff, then fail gracefully. Don't hammer—if it's a 503 overload, more traffic makes it worse. Surface a user-friendly fallback ("We're a bit busy, try again in a moment") rather than a raw stack trace. Check status.openai.com before spending an hour debugging your own code during a real outage.

terminal command line interface

The Codex CLI (@openai/codex) is where I see the most avoidable auth pain. A few hard-won facts:

Requires Node 18+. Install with npm i -g @openai/codex. Avoid sudo global installs—they cause permission breakage later. Use a Node version manager like nvm so your global bin is user-owned.
Device login uses --device-auth, not --device-code. This is the single most common typo:
```
codex login --device-auth
```
For API-key login, pipe via stdin—never pass the key as a CLI argument (args leak into shell history and process listings):
```
printenv OPENAI_API_KEY | codex login --with-api-key
```
Version-specific flags drift. If a command doesn't behave as documented, run codex --help and check the official docs rather than trusting a blog (including this one) verbatim.

If you also work with Anthropic's Claude Code CLI, the auth model is different there—prefer the in-session /login command rather than inventing flags, and hedge on version-specific behavior the same way.

Do's and Don'ts Cheat Sheet

Do	Don't
Read the HTTP status, then `error.type`, then the message	Parse the human message string to branch logic
Retry `429 rate_limit` and `5xx` with exponential backoff + jitter	Retry `400` or `429 insufficient_quota`—they never recover
Pipe secrets via stdin (`printenv KEY \| codex login --with-api-key`)	Pass API keys as CLI arguments
Use `nvm` for global npm installs	Run `sudo npm i -g` for CLI tools
Verify flags with `--help` / official docs	Trust version-specific flags from memory
Honor the `Retry-After` header	Hammer a `503` with more concurrent requests

A Few Tips From the Trenches

Centralize your error handling. One wrapper that maps status → action beats try/except sprinkled across fifty call sites.
Log the request_id. Every OpenAI response carries one; it's gold when you open a support ticket.
Set sane timeouts. A hung socket masquerades as a "weird" error. A 30–60s client timeout surfaces problems honestly.
Test the unhappy path. Deliberately send a bad key and an oversized prompt in CI so your fallback logic is exercised before prod does it for you.
Separate "my bug" from "their outage" fast. The status page and request_id resolve that question in under a minute.

For broader prompt-side robustness, our prompt engineering tips and the main blog cover how to keep requests inside context limits and reduce wasteful token usage—often the cheapest way to dodge 400 and 429 entirely.

Wrap-Up: Build a Reflex, Not a Lookup Habit

The goal isn't to memorize every code—it's to build a reflex. Status code first, error.type second, message last. Once that's automatic, GPT API errors stop being mysteries and become a quick decision tree: retry it, fix my request, or fix my billing.

The three rules that have saved me the most grief:

A 429 on OpenAI means either rate limit or dead quota—always check error.type before retrying.
4xx errors (except 429 rate_limit) are contract violations; retries won't save you.
Never put secrets on a command line—pipe them through stdin.

Your next action: wrap your OpenAI client in a single error-handling helper today, map each status to an action using the table above, and add one CI test that fires a bad key. Twenty minutes now buys you a quiet on-call later. For authoritative details, keep the OpenAI API error reference and status page bookmarked—and verify any version-specific CLI flag with --help before you ship.

Got a recurring error this guide didn't cover? Drop by the blog for more hands-on debugging walkthroughs.