OpenAI JSON Mode: response_format json_object Explained

Last updated: May 23, 2026

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

OpenAI JSON mode is a request parameter — response_format — that forces the chat completions API to return parseable JSON instead of free-form prose. The basic form, { type: "json_object" }, guarantees only that the output parses; the shape is up to the model. The stricter form, { type: "json_schema", json_schema: { strict: true, schema } }— known as Structured Outputs and shipped August 2024 — constrains the output to match a JSON Schema you supply, with 100 percent adherence per OpenAI's documentation. Two footguns trip up first-time users: json_object mode requires the literal word "JSON" in your messages, and Structured Outputs is gated to models from gpt-4o-2024-08-06 and later. This guide covers both modes, the model support matrix, truncation handling, cost overhead, and how the feature compares against function calling for tool-style use cases.

Generating JSON from an LLM and need to confirm it matches your schema? Jsonic's JSON Schema Validator validates the model output against your schema and pinpoints exactly which field violated which constraint — handy when debugging json_object mode without strict enforcement.

Validate against JSON Schema

What OpenAI JSON mode actually does (and what it doesn't)

JSON mode is a constraint applied at decoding time. When you set response_format, OpenAI's sampler restricts the next-token choices so the running output remains a prefix of a valid JSON document. For json_object, the constraint is grammar-level — the output must be a parseable JSON value. For json_schema with strict: true, the constraint extends to your schema — token-level filtering ensures every emitted character keeps the output schema-valid.

What JSON mode does not do:

It does not validate semantics. A schema-valid response can still be wrong — the model can return well-typed but factually incorrect fields.
It does not prevent refusals. If safety filters fire, you get a refusal field instead of structured content. Check message.refusal before parsing.
It does not prevent truncation. If max_tokens is reached mid-document, you get a partial response with finish_reason: "length".
It does not lower hallucination rates. The model still invents data — it just invents type-correct data.

The right mental model: JSON mode is a parser-safety guarantee, not a correctness guarantee. You still need application-level validation for business rules, sanity checks, and cross-field consistency. For the broader picture on LLM-driven JSON output, see our general LLM JSON output guide.

Using response_format: json_object — the basic mode

json_object is the original JSON mode, shipped November 2023 alongside the gpt-3.5-turbo-1106 and gpt-4-1106-preview snapshots. The API contract: set response_format on the request, and the content string in the response is guaranteed to be parseable JSON. No shape guarantees — just JSON.parse safety.

# Python — openai SDK v1.40+
from openai import OpenAI
import json

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "You extract structured data and respond in JSON only.",
        },
        {
            "role": "user",
            "content": "Extract name and age from: Alice is 32 years old.",
        },
    ],
)

data = json.loads(response.choices[0].message.content)
print(data)  # {'name': 'Alice', 'age': 32}

Two things to notice. First, the system message contains the word "JSON" — this is mandatory. Second, the field names name and age are not guaranteed — the next call might return full_name and years_old, or wrap the result in an outer {"person": ...} object. To pin down the shape, either describe it explicitly in the prompt or move up to json_schema mode.

For deeper handling of the parsed result in typed languages, see our TypeScript JSON parsing guide.

Using response_format: json_schema — Structured Outputs

Structured Outputs is the strict mode. You supply a JSON Schema; the API decodes tokens that keep the output valid against your schema for the entire response. With strict: true, OpenAI documents 100 percent schema adherence — the model physically cannot emit output that violates the schema. This is the recommended path for new code targeting modern models. Our Structured Outputs deep dive covers the feature in more detail.

# Python — openai SDK v1.40+, with Pydantic
from openai import OpenAI
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int
    email: str | None

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract person details from the text."},
        {"role": "user", "content": "Alice is 32, alice@example.com"},
    ],
    response_format=Person,
)

person = response.choices[0].message.parsed
print(person.name, person.age, person.email)

The .parse() helper (instead of .create()) handles the schema conversion from Pydantic and gives you a typed Python object back. For Node.js the equivalent is the zod-to-json-schema bridge plus the parse SDK helper:

// Node.js — openai SDK v4.55+, with Zod
import OpenAI from 'openai'
import { zodResponseFormat } from 'openai/helpers/zod'
import { z } from 'zod'

const Person = z.object({
  name: z.string(),
  age: z.number().int(),
  email: z.string().nullable(),
})

const client = new OpenAI()

const response = await client.beta.chat.completions.parse({
  model: 'gpt-4o-2024-08-06',
  messages: [
    { role: 'system', content: 'Extract person details from the text.' },
    { role: 'user', content: 'Alice is 32, alice@example.com' },
  ],
  response_format: zodResponseFormat(Person, 'person'),
})

const person = response.choices[0].message.parsed
console.log(person.name, person.age, person.email)

Schema rules that bite first-time users:

additionalProperties must be false on every object — the API rejects schemas that omit this
Every field must be in required — to make a field optional, add null to its type union
Field names are limited to 80 characters
Total schema size has a token budget; very large schemas (thousands of properties) are rejected

For a wider tour of JSON Schema itself, see our JSON Schema tutorial.

Required prompt: why you must mention "JSON" in the system message

The most common stumble with json_object mode is this 400 error:

openai.BadRequestError: Error code: 400 - {
  'error': {
    'message': "'messages' must contain the word 'json' in some form
                to use 'response_format' of type 'json_object'.",
    'type': 'invalid_request_error',
    'code': None
  }
}

The fix is to add the word JSON anywhere in the messagesarray. A single mention in the system message is enough. The reason: when you force JSON output without mentioning JSON in the prompt, the model can get stuck generating whitespace forever — token-level constraints prevent it from saying anything except a JSON document, but the prompt gives it no signal that it should. Adding "respond in JSON" to the system message resolves the ambiguity.

Code that fails:

# Will raise: messages must contain the word 'json'
client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[
        {"role": "user", "content": "Tell me about Alice."}
    ],
)

Code that works:

client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Respond in JSON."},
        {"role": "user", "content": "Tell me about Alice."},
    ],
)

Case insensitive — both "JSON" and "json" pass the check. The rule does not apply to json_schemamode: the schema itself signals the intent, so no "JSON" mention is required in the prompt.

Supported models and version requirements

JSON mode support is split across two features and varies by snapshot. The matrix below covers the OpenAI chat completions models that matter for production work as of May 2026.

Model	json_object	json_schema (strict)	Notes
`gpt-3.5-turbo-0613` and earlier	No	No	Pre-JSON-mode snapshots
`gpt-3.5-turbo-1106` and later	Yes	No	First json_object support (Nov 2023)
`gpt-4-1106-preview`, `gpt-4-turbo`	Yes	No	json_object only — no Structured Outputs
`gpt-4o-2024-05-13`	Yes	No	Original gpt-4o release — pre-Structured-Outputs
`gpt-4o-mini-2024-07-18`	Yes	Yes	First gpt-4o-mini, supports strict json_schema
`gpt-4o-2024-08-06` and later	Yes	Yes	First gpt-4o with Structured Outputs
`gpt-4o` (default alias)	Yes	Yes	Points to the latest gpt-4o snapshot
`o1-mini`, `o1-preview`	No	No	Reasoning models — no response_format support at launch

For new code, the recommended defaults are gpt-4o-mini for high-volume extraction (cheap, fast, supports strict mode) and gpt-4o for complex reasoning over structured output. Pin to a specific snapshot (gpt-4o-mini-2024-07-18) for reproducibility — the unversioned alias points to the latest snapshot and changes silently over time.

If you must use a model that does not support strict mode, fall back to json_object plus Pydantic or Zod validation in your application code. The cost overhead of one extra parse-and-validate pass is negligible compared to the LLM call itself.

Handling truncation and incomplete JSON when max_tokens is hit

Truncation is the JSON-mode failure that surprises people most often. The model is generating valid JSON, the parser is happy, and then suddenly the output stops mid-string with an unmatched brace. The cause is always the same: the response hit max_tokens before the model could close the document.

Detection. Every chat completion has a finish_reason on the choice. The values you care about:

stop — the model finished naturally; the JSON is complete
length — the response hit max_tokens and was cut off; the JSON is partial
content_filter — safety filter triggered; check message.refusal

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    max_tokens=512,
    messages=[...],
)

choice = response.choices[0]
if choice.finish_reason == "length":
    raise RuntimeError("output truncated — raise max_tokens or tighten schema")

data = json.loads(choice.message.content)

Prevention. Three levers. First, raise max_tokens — the default in some SDK versions is just 16, which truncates anything beyond a tiny object. Modern gpt-4o models support thousands of output tokens in a single response. Second, tighten the schema — every loose string field is a place the model can waste tokens writing essays. Add maxLength to strings and maxItemsto arrays. Third, instruct the model to be terse: "use compact JSON, no extra prose" in the system message saves real tokens.

If a job genuinely needs more output than fits in one response, split the task at the application level — generate a list of identifiers first, then fan out one request per item. This pattern composes well with parallel API calls and avoids partial-JSON parsing entirely.

JSON mode vs function calling vs Structured Outputs: decision matrix

Three OpenAI features produce structured output: response_format: json_object, response_format: json_schema (Structured Outputs), and function calling (tools). They overlap but solve different problems. Our function calling with JSON Schema guide covers the tools API in detail; here is the comparison.

Feature	json_object	json_schema (strict)	Function calling (tools)
Guarantees valid JSON parse	Yes	Yes	Yes
Guarantees schema adherence	No	Yes (100 percent)	Yes with strict tools
Model must choose to use it	No — always returns JSON	No — always returns JSON	Yes — model decides per turn
Multiple shapes per response	Yes (custom JSON)	One schema per call	Yes — multiple tools, multiple calls
Native pairing with code execution	No	No	Yes — designed for agent loops
Model support	gpt-3.5-turbo-1106+	gpt-4o-2024-08-06+, gpt-4o-mini-2024-07-18+	gpt-3.5-turbo-1106+, varies by feature
Best for	Open-ended JSON, older models	Single-shape extraction, classification	Agents, multi-step workflows, tool use

Picking the right one. For one-shot extraction or classification with a known shape, use json_schema — it is the lowest friction and gives you type safety. For agent workflows where the model orchestrates multiple tools and decides whether to call any of them, use function calling — the conditional dispatch is the point. For free-form generation where you just want parseable JSON and the shape varies, use json_object. The three features compose: an agent call can include both tools and a json_schema response format for the final answer.

Cost and latency: token overhead of strict mode

JSON mode is not free. The cost comes in three forms.

Schema tokens. With json_schema, the schema you supply counts as input tokens on every request. A schema with 20 fields and verbose descriptions can run 500 to 1000 tokens — multiply that by every call and the cost adds up. Keep field description values short or omit them when the field name is self-explanatory. The model rarely benefits from prose descriptions if the field name already says what it is.

First-call latency. The first time you send a new schema, OpenAI compiles it into the constrained-decoding state machine. The compilation step adds latency — typically a few hundred milliseconds on the first request. Subsequent requests with the same schema hit a warm cache and run at normal latency. In practice this matters most for cold-start serverless functions; if your traffic is steady, the cache stays warm.

Constrained-decoding overhead. Strict mode adds a small per-token cost — OpenAI does not publish exact numbers, but published benchmarks suggest 1 to 5 percent slowdown compared to unconstrained generation. The trade-off is usually worth it: skipping retry-on-parse-error loops more than makes up for the per-token slowdown.

Optimization tips.

Cache the schema object — most SDKs hash it for the first-call compile cache
Use gpt-4o-mini for high-volume extraction — it is roughly 25× cheaper than gpt-4o per token and supports the same strict mode
Trim schema description fields once you confirm the field names work alone
Batch related extractions when possible — one call with five items is cheaper than five calls with one item each (one schema parse, one warm cache)

Key terms

response_format: The request parameter on OpenAI's chat completions API that switches output mode. Accepts { type: "text" } (default), { type: "json_object" }, or { type: "json_schema", json_schema: {...} }.
json_object mode: The original JSON mode, shipped November 2023. Guarantees the output is parseable JSON but imposes no shape constraints. Requires the literal string "JSON" to appear in the messages array.
Structured Outputs: The branded name for response_format: json_schema with strict: true. Constrains output to match a supplied JSON Schema with 100 percent adherence per OpenAI documentation. Requires gpt-4o-2024-08-06 or later.
strict mode: The flag inside json_schema that enables token-level schema enforcement. With strict: true, every emitted character keeps the output schema-valid; with strict: false, the schema is a strong hint but not enforced.
finish_reason: A field on every chat completion choice that explains why generation stopped. Values include stop (natural completion), length (hit max_tokens — truncated), content_filter (safety refusal), and tool_calls (model chose to call a function).
refusal: A field on the message object that, when present, indicates the model declined to answer due to safety filters. With strict json_schema, refusal replaces content — always check it before parsing.

Frequently asked questions

What does response_format json_object actually guarantee?

response_format with type json_object guarantees one thing only: the model output will be syntactically valid JSON that parses without errors. It does not guarantee shape, field names, types, or required fields — the model can return any valid JSON document, including an empty object, an array, a string, or a deeply nested structure with fields you never asked for. Field names may drift between calls; optional fields may be omitted; numbers may arrive as strings, dates may arrive as ISO strings or epoch seconds depending on what the model decides in the moment. If you need shape guarantees, use the newer json_schema response_format with strict: true (the Structured Outputs feature shipped August 2024), which constrains the output to a specific JSON Schema with 100 percent adherence according to OpenAI documentation. Treat json_object as parser safety only — always validate the parsed object against a schema in your application code before using the fields downstream. Pydantic in Python and Zod in TypeScript are the common choices; both fail loudly when fields drift, which beats silent corruption in production data pipelines.

Why do I get an error saying 'messages must contain the word json'?

OpenAI requires that the literal string "JSON" appear somewhere in your messages array when you set response_format to json_object. This is a safety check enforced before the request reaches the model — without it, the API returns a 400 error immediately: "messages must contain the word 'json' in some form to use response_format of type 'json_object'". The simplest fix is to mention JSON in your system message: "You are a helpful assistant that responds in JSON only." Case matters less than presence — both "JSON" and "json" satisfy the check, and the word can appear anywhere in any message role (system, user, or assistant). The rule exists because json_object mode forces the model to produce JSON regardless of what the user asked; if the prompt does not mention JSON, the model can get stuck generating whitespace forever, never finding a valid completion path. The Structured Outputs json_schema mode does not have this requirement — the schema itself signals the intent to the model, so you can write user prompts that never mention JSON and the API still accepts them.

What is the difference between json_object and json_schema response_format?

json_object enforces only that the output is valid JSON — any shape, any field names, any types. json_schema (introduced August 2024 as Structured Outputs) constrains the output to match a specific JSON Schema you provide, with strict: true giving 100 percent adherence per OpenAI documentation. json_object works on most chat models including gpt-3.5-turbo-1106 and later; json_schema requires gpt-4o-2024-08-06 or newer, plus gpt-4o-mini-2024-07-18. json_object has no schema overhead — your prompt size is whatever you write. json_schema sends the schema as part of every request and the model performs constrained decoding against it, which adds latency on the first call as the schema is compiled into a state machine. Another practical difference: json_object requires the literal word "JSON" to appear in your messages or the API returns a 400 error, while json_schema needs no such prompt mention. For new code targeting modern models, use json_schema with strict: true — you get parser safety plus shape guarantees in one feature. Keep json_object for older model snapshots, gpt-3.5-turbo cost optimization, or genuinely open-ended generation where you only need valid JSON and the shape varies per call.

Does OpenAI JSON mode work with gpt-3.5-turbo?

The json_object response format works with gpt-3.5-turbo-1106 and later snapshots (released November 2023), and with all gpt-4-turbo, gpt-4o, and gpt-4o-mini snapshots. It does not work with the original gpt-3.5-turbo-0613 or earlier — you will get a 400 error: "response_format is not supported by this model". The newer json_schema response format (Structured Outputs) requires gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18 and later, and does not support any gpt-3.5-turbo variant — that branch never received strict-mode support. If you are stuck on gpt-3.5-turbo for cost reasons, json_object plus application-side schema validation with Pydantic or Zod is your option, and the retry-on-validation-failure loop adds latency you would not pay with strict mode. If you can move to gpt-4o-mini (which is cheaper than gpt-3.5-turbo as of 2025 pricing) the calculation flips: json_schema with strict mode is the better choice because the per-token cost is lower, you get schema enforcement for free, and the model itself is more capable at extraction tasks than gpt-3.5-turbo ever was.

How do I prevent JSON output from being truncated?

Truncation happens when the model hits max_tokens before closing the JSON document — you get a partial response with unmatched brackets that cannot be parsed. Check the finish_reason on every response: if it is "length", the output was cut off; if it is "stop", you got the full document. The fix has three parts. First, raise max_tokens — the default of 16 is far too small for non-trivial JSON, and most models support 4096 or higher in a single response. Second, tighten the schema — every optional field, every loose string, every unbounded array is a place the model can waste tokens. Use maxLength on strings and maxItems on arrays. Third, ask for compact output explicitly in the prompt ("respond with minimal whitespace, no comments, no trailing prose"). If you still hit length limits, split the task — generate a list of IDs first, then expand each one in a separate call.

Can I use JSON mode with streaming responses?

Yes — both json_object and json_schema modes support streaming via the stream: true parameter, and chunks arrive as standard server-sent events with delta.content carrying the next slice of the response. The catch: streamed chunks are not individually parseable as JSON. You receive partial strings ("{\"name\": \"al"...) and must accumulate them until the document closes before calling JSON.parse. For json_schema mode, OpenAI guarantees the final concatenated string is valid against the schema, but intermediate chunks are not — and a chunk boundary can split a multi-byte UTF-8 character. Several libraries handle incremental JSON streaming with partial parsing — the Vercel AI SDK exposes useObject and streamObject for typed partial extraction, while Instructor for Python ships partial-streaming with Pydantic models. For a typical use case (a user-facing assistant that streams the visible answer), let the SDK accumulate chunks and parse at the end. For a streaming-first UI that needs to render partial fields as they arrive, use a partial-JSON parser that tolerates unmatched closing brackets and missing values until the stream completes.

Does Structured Outputs work with all gpt-4o snapshots?

No. Structured Outputs (json_schema with strict: true) requires gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18 and later. The earlier gpt-4o-2024-05-13 snapshot — the original gpt-4o release from May 2024 — does not support strict mode. If you pin to a model version older than August 2024, you will get a 400 error when sending json_schema: "response_format json_schema not supported for this model". The same applies to gpt-4-turbo (no version supports strict json_schema) and gpt-3.5-turbo (no version, no plans). For new projects, use gpt-4o-mini-2024-07-18 or later as the default — it is cheap, fast, and supports Structured Outputs. If you must pin to an older snapshot for reproducibility or because your benchmarks depend on it, fall back to json_object plus Pydantic or Zod validation in your application code. The unversioned aliases (plain gpt-4o, plain gpt-4o-mini) always point to a snapshot that supports strict mode, so if reproducibility is not a hard requirement those aliases are the simplest path. OpenAI publishes a model support matrix on the Structured Outputs docs page that is the source of truth for which snapshots qualify.

What happens if the model can't satisfy the schema?

With strict: true Structured Outputs, the model is constrained at the token level to produce output matching your schema — it physically cannot emit a token that would violate the schema, so the returned JSON always parses against it when the response completes normally. What can happen instead is a refusal: if the request triggers OpenAI safety filters or the model decides it cannot answer, the API returns a refusal field on the message instead of the structured content. Always check message.refusal before parsing message.content — that field is the API's way of saying "here is why no schema-valid answer was produced". Without strict mode (strict: false or omitted), the model is steered toward the schema but not constrained, so violations are possible — extra fields, missing required fields, wrong types, numbers as strings — and you must validate with Pydantic, Zod, or your JSON Schema validator of choice in application code. The model can also still hit max_tokens mid-document, producing valid-against-schema-so-far output that gets truncated; check finish_reason on every response and treat "length" as a retry signal with a higher token budget.