OpenAI JSON Mode: response_format json_object Explained
Last updated:
OpenAI JSON mode is a request parameter — response_format — that forces the chat completions API to return parseable JSON instead of free-form prose. The basic form, { type: "json_object" }, guarantees only that the output parses; the shape is up to the model. The stricter form, { type: "json_schema", json_schema: { strict: true, schema } }— known as Structured Outputs and shipped August 2024 — constrains the output to match a JSON Schema you supply, with 100 percent adherence per OpenAI's documentation. Two footguns trip up first-time users: json_object mode requires the literal word "JSON" in your messages, and Structured Outputs is gated to models from gpt-4o-2024-08-06 and later. This guide covers both modes, the model support matrix, truncation handling, cost overhead, and how the feature compares against function calling for tool-style use cases.
Generating JSON from an LLM and need to confirm it matches your schema? Jsonic's JSON Schema Validator validates the model output against your schema and pinpoints exactly which field violated which constraint — handy when debugging json_object mode without strict enforcement.
Validate against JSON SchemaWhat OpenAI JSON mode actually does (and what it doesn't)
JSON mode is a constraint applied at decoding time. When you set response_format, OpenAI's sampler restricts the next-token choices so the running output remains a prefix of a valid JSON document. For json_object, the constraint is grammar-level — the output must be a parseable JSON value. For json_schema with strict: true, the constraint extends to your schema — token-level filtering ensures every emitted character keeps the output schema-valid.
What JSON mode does not do:
- It does not validate semantics. A schema-valid response can still be wrong — the model can return well-typed but factually incorrect fields.
- It does not prevent refusals. If safety filters fire, you get a refusal field instead of structured content. Check
message.refusalbefore parsing. - It does not prevent truncation. If
max_tokensis reached mid-document, you get a partial response withfinish_reason: "length". - It does not lower hallucination rates. The model still invents data — it just invents type-correct data.
The right mental model: JSON mode is a parser-safety guarantee, not a correctness guarantee. You still need application-level validation for business rules, sanity checks, and cross-field consistency. For the broader picture on LLM-driven JSON output, see our general LLM JSON output guide.
Using response_format: json_object — the basic mode
json_object is the original JSON mode, shipped November 2023 alongside the gpt-3.5-turbo-1106 and gpt-4-1106-preview snapshots. The API contract: set response_format on the request, and the content string in the response is guaranteed to be parseable JSON. No shape guarantees — just JSON.parse safety.
# Python — openai SDK v1.40+
from openai import OpenAI
import json
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": "You extract structured data and respond in JSON only.",
},
{
"role": "user",
"content": "Extract name and age from: Alice is 32 years old.",
},
],
)
data = json.loads(response.choices[0].message.content)
print(data) # {'name': 'Alice', 'age': 32}Two things to notice. First, the system message contains the word "JSON" — this is mandatory. Second, the field names name and age are not guaranteed — the next call might return full_name and years_old, or wrap the result in an outer {"person": ...} object. To pin down the shape, either describe it explicitly in the prompt or move up to json_schema mode.
For deeper handling of the parsed result in typed languages, see our TypeScript JSON parsing guide.
Using response_format: json_schema — Structured Outputs
Structured Outputs is the strict mode. You supply a JSON Schema; the API decodes tokens that keep the output valid against your schema for the entire response. With strict: true, OpenAI documents 100 percent schema adherence — the model physically cannot emit output that violates the schema. This is the recommended path for new code targeting modern models. Our Structured Outputs deep dive covers the feature in more detail.
# Python — openai SDK v1.40+, with Pydantic
from openai import OpenAI
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
email: str | None
client = OpenAI()
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract person details from the text."},
{"role": "user", "content": "Alice is 32, alice@example.com"},
],
response_format=Person,
)
person = response.choices[0].message.parsed
print(person.name, person.age, person.email)The .parse() helper (instead of .create()) handles the schema conversion from Pydantic and gives you a typed Python object back. For Node.js the equivalent is the zod-to-json-schema bridge plus the parse SDK helper:
// Node.js — openai SDK v4.55+, with Zod
import OpenAI from 'openai'
import { zodResponseFormat } from 'openai/helpers/zod'
import { z } from 'zod'
const Person = z.object({
name: z.string(),
age: z.number().int(),
email: z.string().nullable(),
})
const client = new OpenAI()
const response = await client.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06',
messages: [
{ role: 'system', content: 'Extract person details from the text.' },
{ role: 'user', content: 'Alice is 32, alice@example.com' },
],
response_format: zodResponseFormat(Person, 'person'),
})
const person = response.choices[0].message.parsed
console.log(person.name, person.age, person.email)Schema rules that bite first-time users:
additionalPropertiesmust befalseon every object — the API rejects schemas that omit this- Every field must be in
required— to make a field optional, addnullto itstypeunion - Field names are limited to 80 characters
- Total schema size has a token budget; very large schemas (thousands of properties) are rejected
For a wider tour of JSON Schema itself, see our JSON Schema tutorial.
Required prompt: why you must mention "JSON" in the system message
The most common stumble with json_object mode is this 400 error:
openai.BadRequestError: Error code: 400 - {
'error': {
'message': "'messages' must contain the word 'json' in some form
to use 'response_format' of type 'json_object'.",
'type': 'invalid_request_error',
'code': None
}
}The fix is to add the word JSON anywhere in the messagesarray. A single mention in the system message is enough. The reason: when you force JSON output without mentioning JSON in the prompt, the model can get stuck generating whitespace forever — token-level constraints prevent it from saying anything except a JSON document, but the prompt gives it no signal that it should. Adding "respond in JSON" to the system message resolves the ambiguity.
Code that fails:
# Will raise: messages must contain the word 'json'
client.chat.completions.create(
model="gpt-4o-mini",
response_format={"type": "json_object"},
messages=[
{"role": "user", "content": "Tell me about Alice."}
],
)Code that works:
client.chat.completions.create(
model="gpt-4o-mini",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "Respond in JSON."},
{"role": "user", "content": "Tell me about Alice."},
],
)Case insensitive — both "JSON" and "json" pass the check. The rule does not apply to json_schemamode: the schema itself signals the intent, so no "JSON" mention is required in the prompt.
Supported models and version requirements
JSON mode support is split across two features and varies by snapshot. The matrix below covers the OpenAI chat completions models that matter for production work as of May 2026.
| Model | json_object | json_schema (strict) | Notes |
|---|---|---|---|
gpt-3.5-turbo-0613 and earlier | No | No | Pre-JSON-mode snapshots |
gpt-3.5-turbo-1106 and later | Yes | No | First json_object support (Nov 2023) |
gpt-4-1106-preview, gpt-4-turbo | Yes | No | json_object only — no Structured Outputs |
gpt-4o-2024-05-13 | Yes | No | Original gpt-4o release — pre-Structured-Outputs |
gpt-4o-mini-2024-07-18 | Yes | Yes | First gpt-4o-mini, supports strict json_schema |
gpt-4o-2024-08-06 and later | Yes | Yes | First gpt-4o with Structured Outputs |
gpt-4o (default alias) | Yes | Yes | Points to the latest gpt-4o snapshot |
o1-mini, o1-preview | No | No | Reasoning models — no response_format support at launch |
For new code, the recommended defaults are gpt-4o-mini for high-volume extraction (cheap, fast, supports strict mode) and gpt-4o for complex reasoning over structured output. Pin to a specific snapshot (gpt-4o-mini-2024-07-18) for reproducibility — the unversioned alias points to the latest snapshot and changes silently over time.
If you must use a model that does not support strict mode, fall back to json_object plus Pydantic or Zod validation in your application code. The cost overhead of one extra parse-and-validate pass is negligible compared to the LLM call itself.
Handling truncation and incomplete JSON when max_tokens is hit
Truncation is the JSON-mode failure that surprises people most often. The model is generating valid JSON, the parser is happy, and then suddenly the output stops mid-string with an unmatched brace. The cause is always the same: the response hit max_tokens before the model could close the document.
Detection. Every chat completion has a finish_reason on the choice. The values you care about:
stop— the model finished naturally; the JSON is completelength— the response hitmax_tokensand was cut off; the JSON is partialcontent_filter— safety filter triggered; checkmessage.refusal
response = client.chat.completions.create(
model="gpt-4o-mini",
response_format={"type": "json_object"},
max_tokens=512,
messages=[...],
)
choice = response.choices[0]
if choice.finish_reason == "length":
raise RuntimeError("output truncated — raise max_tokens or tighten schema")
data = json.loads(choice.message.content)Prevention. Three levers. First, raise max_tokens — the default in some SDK versions is just 16, which truncates anything beyond a tiny object. Modern gpt-4o models support thousands of output tokens in a single response. Second, tighten the schema — every loose string field is a place the model can waste tokens writing essays. Add maxLength to strings and maxItemsto arrays. Third, instruct the model to be terse: "use compact JSON, no extra prose" in the system message saves real tokens.
If a job genuinely needs more output than fits in one response, split the task at the application level — generate a list of identifiers first, then fan out one request per item. This pattern composes well with parallel API calls and avoids partial-JSON parsing entirely.
JSON mode vs function calling vs Structured Outputs: decision matrix
Three OpenAI features produce structured output: response_format: json_object, response_format: json_schema (Structured Outputs), and function calling (tools). They overlap but solve different problems. Our function calling with JSON Schema guide covers the tools API in detail; here is the comparison.
| Feature | json_object | json_schema (strict) | Function calling (tools) |
|---|---|---|---|
| Guarantees valid JSON parse | Yes | Yes | Yes |
| Guarantees schema adherence | No | Yes (100 percent) | Yes with strict tools |
| Model must choose to use it | No — always returns JSON | No — always returns JSON | Yes — model decides per turn |
| Multiple shapes per response | Yes (custom JSON) | One schema per call | Yes — multiple tools, multiple calls |
| Native pairing with code execution | No | No | Yes — designed for agent loops |
| Model support | gpt-3.5-turbo-1106+ | gpt-4o-2024-08-06+, gpt-4o-mini-2024-07-18+ | gpt-3.5-turbo-1106+, varies by feature |
| Best for | Open-ended JSON, older models | Single-shape extraction, classification | Agents, multi-step workflows, tool use |
Picking the right one. For one-shot extraction or classification with a known shape, use json_schema — it is the lowest friction and gives you type safety. For agent workflows where the model orchestrates multiple tools and decides whether to call any of them, use function calling — the conditional dispatch is the point. For free-form generation where you just want parseable JSON and the shape varies, use json_object. The three features compose: an agent call can include both tools and a json_schema response format for the final answer.
Cost and latency: token overhead of strict mode
JSON mode is not free. The cost comes in three forms.
Schema tokens. With json_schema, the schema you supply counts as input tokens on every request. A schema with 20 fields and verbose descriptions can run 500 to 1000 tokens — multiply that by every call and the cost adds up. Keep field description values short or omit them when the field name is self-explanatory. The model rarely benefits from prose descriptions if the field name already says what it is.
First-call latency. The first time you send a new schema, OpenAI compiles it into the constrained-decoding state machine. The compilation step adds latency — typically a few hundred milliseconds on the first request. Subsequent requests with the same schema hit a warm cache and run at normal latency. In practice this matters most for cold-start serverless functions; if your traffic is steady, the cache stays warm.
Constrained-decoding overhead. Strict mode adds a small per-token cost — OpenAI does not publish exact numbers, but published benchmarks suggest 1 to 5 percent slowdown compared to unconstrained generation. The trade-off is usually worth it: skipping retry-on-parse-error loops more than makes up for the per-token slowdown.
Optimization tips.
- Cache the schema object — most SDKs hash it for the first-call compile cache
- Use
gpt-4o-minifor high-volume extraction — it is roughly 25× cheaper than gpt-4o per token and supports the same strict mode - Trim schema
descriptionfields once you confirm the field names work alone - Batch related extractions when possible — one call with five items is cheaper than five calls with one item each (one schema parse, one warm cache)
Key terms
- response_format
- The request parameter on OpenAI's chat completions API that switches output mode. Accepts
{ type: "text" }(default),{ type: "json_object" }, or{ type: "json_schema", json_schema: {...} }. - json_object mode
- The original JSON mode, shipped November 2023. Guarantees the output is parseable JSON but imposes no shape constraints. Requires the literal string "JSON" to appear in the messages array.
- Structured Outputs
- The branded name for
response_format: json_schemawithstrict: true. Constrains output to match a supplied JSON Schema with 100 percent adherence per OpenAI documentation. Requires gpt-4o-2024-08-06 or later. - strict mode
- The flag inside
json_schemathat enables token-level schema enforcement. Withstrict: true, every emitted character keeps the output schema-valid; withstrict: false, the schema is a strong hint but not enforced. - finish_reason
- A field on every chat completion choice that explains why generation stopped. Values include
stop(natural completion),length(hit max_tokens — truncated),content_filter(safety refusal), andtool_calls(model chose to call a function). - refusal
- A field on the message object that, when present, indicates the model declined to answer due to safety filters. With strict json_schema,
refusalreplacescontent— always check it before parsing.
Frequently asked questions
What does response_format json_object actually guarantee?
response_format with type json_object guarantees one thing only: the model output will be syntactically valid JSON that parses without errors. It does not guarantee shape, field names, types, or required fields — the model can return any valid JSON document, including an empty object, an array, a string, or a deeply nested structure with fields you never asked for. Field names may drift between calls; optional fields may be omitted; numbers may arrive as strings, dates may arrive as ISO strings or epoch seconds depending on what the model decides in the moment. If you need shape guarantees, use the newer json_schema response_format with strict: true (the Structured Outputs feature shipped August 2024), which constrains the output to a specific JSON Schema with 100 percent adherence according to OpenAI documentation. Treat json_object as parser safety only — always validate the parsed object against a schema in your application code before using the fields downstream. Pydantic in Python and Zod in TypeScript are the common choices; both fail loudly when fields drift, which beats silent corruption in production data pipelines.
Why do I get an error saying 'messages must contain the word json'?
OpenAI requires that the literal string "JSON" appear somewhere in your messages array when you set response_format to json_object. This is a safety check enforced before the request reaches the model — without it, the API returns a 400 error immediately: "messages must contain the word 'json' in some form to use response_format of type 'json_object'". The simplest fix is to mention JSON in your system message: "You are a helpful assistant that responds in JSON only." Case matters less than presence — both "JSON" and "json" satisfy the check, and the word can appear anywhere in any message role (system, user, or assistant). The rule exists because json_object mode forces the model to produce JSON regardless of what the user asked; if the prompt does not mention JSON, the model can get stuck generating whitespace forever, never finding a valid completion path. The Structured Outputs json_schema mode does not have this requirement — the schema itself signals the intent to the model, so you can write user prompts that never mention JSON and the API still accepts them.
What is the difference between json_object and json_schema response_format?
json_object enforces only that the output is valid JSON — any shape, any field names, any types. json_schema (introduced August 2024 as Structured Outputs) constrains the output to match a specific JSON Schema you provide, with strict: true giving 100 percent adherence per OpenAI documentation. json_object works on most chat models including gpt-3.5-turbo-1106 and later; json_schema requires gpt-4o-2024-08-06 or newer, plus gpt-4o-mini-2024-07-18. json_object has no schema overhead — your prompt size is whatever you write. json_schema sends the schema as part of every request and the model performs constrained decoding against it, which adds latency on the first call as the schema is compiled into a state machine. Another practical difference: json_object requires the literal word "JSON" to appear in your messages or the API returns a 400 error, while json_schema needs no such prompt mention. For new code targeting modern models, use json_schema with strict: true — you get parser safety plus shape guarantees in one feature. Keep json_object for older model snapshots, gpt-3.5-turbo cost optimization, or genuinely open-ended generation where you only need valid JSON and the shape varies per call.
Does OpenAI JSON mode work with gpt-3.5-turbo?
The json_object response format works with gpt-3.5-turbo-1106 and later snapshots (released November 2023), and with all gpt-4-turbo, gpt-4o, and gpt-4o-mini snapshots. It does not work with the original gpt-3.5-turbo-0613 or earlier — you will get a 400 error: "response_format is not supported by this model". The newer json_schema response format (Structured Outputs) requires gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18 and later, and does not support any gpt-3.5-turbo variant — that branch never received strict-mode support. If you are stuck on gpt-3.5-turbo for cost reasons, json_object plus application-side schema validation with Pydantic or Zod is your option, and the retry-on-validation-failure loop adds latency you would not pay with strict mode. If you can move to gpt-4o-mini (which is cheaper than gpt-3.5-turbo as of 2025 pricing) the calculation flips: json_schema with strict mode is the better choice because the per-token cost is lower, you get schema enforcement for free, and the model itself is more capable at extraction tasks than gpt-3.5-turbo ever was.
How do I prevent JSON output from being truncated?
Truncation happens when the model hits max_tokens before closing the JSON document — you get a partial response with unmatched brackets that cannot be parsed. Check the finish_reason on every response: if it is "length", the output was cut off; if it is "stop", you got the full document. The fix has three parts. First, raise max_tokens — the default of 16 is far too small for non-trivial JSON, and most models support 4096 or higher in a single response. Second, tighten the schema — every optional field, every loose string, every unbounded array is a place the model can waste tokens. Use maxLength on strings and maxItems on arrays. Third, ask for compact output explicitly in the prompt ("respond with minimal whitespace, no comments, no trailing prose"). If you still hit length limits, split the task — generate a list of IDs first, then expand each one in a separate call.
Can I use JSON mode with streaming responses?
Yes — both json_object and json_schema modes support streaming via the stream: true parameter, and chunks arrive as standard server-sent events with delta.content carrying the next slice of the response. The catch: streamed chunks are not individually parseable as JSON. You receive partial strings ("{\"name\": \"al"...) and must accumulate them until the document closes before calling JSON.parse. For json_schema mode, OpenAI guarantees the final concatenated string is valid against the schema, but intermediate chunks are not — and a chunk boundary can split a multi-byte UTF-8 character. Several libraries handle incremental JSON streaming with partial parsing — the Vercel AI SDK exposes useObject and streamObject for typed partial extraction, while Instructor for Python ships partial-streaming with Pydantic models. For a typical use case (a user-facing assistant that streams the visible answer), let the SDK accumulate chunks and parse at the end. For a streaming-first UI that needs to render partial fields as they arrive, use a partial-JSON parser that tolerates unmatched closing brackets and missing values until the stream completes.
Does Structured Outputs work with all gpt-4o snapshots?
No. Structured Outputs (json_schema with strict: true) requires gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18 and later. The earlier gpt-4o-2024-05-13 snapshot — the original gpt-4o release from May 2024 — does not support strict mode. If you pin to a model version older than August 2024, you will get a 400 error when sending json_schema: "response_format json_schema not supported for this model". The same applies to gpt-4-turbo (no version supports strict json_schema) and gpt-3.5-turbo (no version, no plans). For new projects, use gpt-4o-mini-2024-07-18 or later as the default — it is cheap, fast, and supports Structured Outputs. If you must pin to an older snapshot for reproducibility or because your benchmarks depend on it, fall back to json_object plus Pydantic or Zod validation in your application code. The unversioned aliases (plain gpt-4o, plain gpt-4o-mini) always point to a snapshot that supports strict mode, so if reproducibility is not a hard requirement those aliases are the simplest path. OpenAI publishes a model support matrix on the Structured Outputs docs page that is the source of truth for which snapshots qualify.
What happens if the model can't satisfy the schema?
With strict: true Structured Outputs, the model is constrained at the token level to produce output matching your schema — it physically cannot emit a token that would violate the schema, so the returned JSON always parses against it when the response completes normally. What can happen instead is a refusal: if the request triggers OpenAI safety filters or the model decides it cannot answer, the API returns a refusal field on the message instead of the structured content. Always check message.refusal before parsing message.content — that field is the API's way of saying "here is why no schema-valid answer was produced". Without strict mode (strict: false or omitted), the model is steered toward the schema but not constrained, so violations are possible — extra fields, missing required fields, wrong types, numbers as strings — and you must validate with Pydantic, Zod, or your JSON Schema validator of choice in application code. The model can also still hit max_tokens mid-document, producing valid-against-schema-so-far output that gets truncated; check finish_reason on every response and treat "length" as a retry signal with a higher token budget.
Further reading and primary sources
- OpenAI Docs — Structured Outputs — Authoritative reference for response_format json_schema, strict mode, and the model support matrix
- OpenAI Docs — JSON Mode — The original json_object mode docs — the "JSON" prompt rule and supported snapshots
- OpenAI Cookbook — Structured Outputs examples — Working notebooks for Pydantic, Zod, and schema design patterns
- JSON Schema specification — The JSON Schema standard that Structured Outputs consumes — useful for understanding which keywords OpenAI supports
- Instructor library — Pydantic-first wrapper around OpenAI and other providers with retries, partial streaming, and validation hooks