OpenAI JSON Mode and Function Calling: Structured JSON Output

Q: What is OpenAI JSON mode?

OpenAI JSON mode is enabled by setting response_format: { type: "json_object" } in your chat completions request. It guarantees the model returns output that can be parsed as JSON — without markdown fences or prose wrapping. Without JSON mode the model may return JSON-like text surrounded by backtick fences or preceded by explanatory sentences, which breaks JSON.parse(). JSON mode requires that you mention "JSON" somewhere in the system message or user message; if you omit this the API returns an error. It works with GPT-4o, GPT-4-turbo, and GPT-3.5-turbo-1106 and later. Important: JSON mode guarantees valid JSON but does NOT guarantee your specific schema — the model may return any valid JSON object. Always validate output with Zod (Node.js) or Pydantic (Python) to confirm shape and types.

Q: What is the difference between JSON mode and Structured Outputs?

JSON mode (response_format: { type: "json_object" }) guarantees valid JSON in any shape — the model decides the structure. Structured Outputs (response_format: { type: "json_schema", json_schema: { ... } }) guarantees the response matches your JSON Schema exactly, including required fields, types, and enum values. Structured Outputs requires GPT-4o-2024-08-06 or later and the strict: true flag in the schema. With strict mode, all properties must be listed in "required" and additionalProperties must be false at every level. Structured Outputs is more reliable for production data extraction pipelines. Use JSON mode as a fallback for older models (GPT-3.5-turbo, GPT-4-turbo) that do not support Structured Outputs. For maximum reliability in production, prefer Structured Outputs with Pydantic or Zod validation as an additional safety layer.

Q: How do function calling and JSON mode differ?

Function calling (also called tool use) is designed for executing actions: the model generates a JSON arguments object matching a declared function schema so your code can call a real function with those arguments. JSON mode is designed for data extraction: the model returns data as a JSON object. Function calling enables agentic workflows — multi-step sequences where the model decides which tools to call, receives tool results, and continues reasoning. JSON mode is a one-shot extraction. Both generate JSON, but the intent differs. You can combine them: define a tool whose parameters use a json_schema shape for the data you need, then call the tool — this gives you function calling reliability with structured schema adherence. Use function calling for agent actions (search, API calls, database writes) and JSON mode or Structured Outputs for pure data extraction.

Q: Should I validate OpenAI JSON output even with Structured Outputs?

Yes — always validate. While Structured Outputs guarantees schema adherence (correct types, required fields present, enum values respected), it cannot guarantee semantic correctness. The model may hallucinate values that match the declared type but are factually wrong: a city name that does not exist, a price that is off by an order of magnitude, a date in the wrong format, or a participant name that was never mentioned in the input. Zod safeParse() and Pydantic model_validate_json() catch type and shape errors at runtime, but your application also needs business logic validation: is the price within a plausible range? Is the date in the future? Does the referenced user ID exist in your database? Think of Structured Outputs as guaranteeing the envelope, not the letter inside.

Q: How many tokens does JSON mode add to my request?

JSON mode itself adds no prompt tokens, but the structured format typically increases output token count by 5–15% compared to prose responses, because JSON key names, quotes, brackets, and colons consume tokens that prose does not need. Function calling tool definitions add approximately 50–100 tokens per tool to the prompt (the schema is injected into the context). Structured Outputs json_schema definitions add roughly 10–20 tokens per field defined in the schema. For high-volume extraction workloads, use gpt-4o-mini, which is significantly cheaper per token than gpt-4o while still supporting Structured Outputs. The OpenAI Batch API provides a 50% cost reduction for asynchronous workloads with a 24-hour processing window — ideal for large dataset extraction where latency is not a concern.

Q: How do I handle OpenAI API errors when using JSON mode?

Check response.choices[0].finish_reason after each call. A value of "stop" means the model completed normally. A value of "length" means the model hit max_tokens before finishing — in JSON mode this produces truncated, unparseable JSON; increase max_tokens or reduce the complexity of the requested schema. A value of "content_filter" means the model was blocked by content policy. Always wrap JSON.parse() / json.loads() in a try/catch or try/except block even in JSON mode — finish_reason "length" will bypass the guarantee. For 429 (rate limit) and 500 (server error) responses, retry with exponential backoff: the openai Python and Node.js SDKs have built-in retry support (max_retries parameter). For 400 errors, read the error.message — a common cause is forgetting to mention "JSON" in the system message when JSON mode is enabled.

Q: Can I use JSON mode with streaming?

Yes — stream: true is compatible with JSON mode and Structured Outputs. With streaming enabled, the model emits partial tokens as they are generated. Each chunk contains chunk.choices[0].delta.content (a partial string). You must accumulate all chunks into a buffer and call JSON.parse() only after the stream closes (finish_reason === "stop"). Attempting to parse partial JSON will fail. In Python: iterate over the stream, concatenate chunk.choices[0].delta.content, then call json.loads() at the end. Partial JSON streaming (parsing JSON incrementally as tokens arrive) requires a streaming JSON parser such as jsonstream or oboe.js — not recommended for most production use cases due to added complexity. Use streaming with JSON mode primarily when you need to show a loading indicator while the model processes, not for incremental JSON consumption.

Q: How do I use the OpenAI Batch API for JSON extraction?

The Batch API processes requests asynchronously at 50% of the synchronous price, with a 24-hour completion window. To use it: (1) Create a JSONL file where each line is a JSON object with custom_id (your identifier), method ("POST"), url ("/v1/chat/completions"), and body (the full chat completions request including response_format). (2) Upload the JSONL file with client.files.create(file=f, purpose="batch"). (3) Submit the batch with client.batches.create(input_file_id=file.id, endpoint="/v1/chat/completions", completion_window="24h"). (4) Poll client.batches.retrieve(batch.id) until status is "completed". (5) Download results from output_file_id. Each result line contains your custom_id and the full API response. Ideal for processing large datasets (thousands of documents) where latency is not a concern.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 19, 2026

Getting reliable structured JSON from OpenAI GPT models requires understanding three distinct mechanisms: JSON mode, Structured Outputs, and function calling. Each offers different guarantees, model requirements, and use cases. This guide covers all three with complete Python and Node.js examples, plus validation patterns with Zod and Pydantic.

JSON Mode — Guaranteed Valid JSON

JSON mode guarantees the model returns parseable JSON. Enable it by setting response_format: { type: "json_object" }. You must also mention "JSON" in the system or user message — the API returns an error if you do not.

from openai import OpenAI
import json

client = OpenAI()  # uses OPENAI_API_KEY env var

# JSON mode — guarantees parseable JSON, not necessarily your schema
response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": (
                "You are a data extraction assistant. "
                "Always respond with valid JSON matching this schema: "
                '{"name": string, "age": number, "email": string}'
            )
        },
        {
            "role": "user",
            "content": "Extract: Alice Smith, 30 years old, alice@example.com"
        },
    ],
)

raw = response.choices[0].message.content
data = json.loads(raw)   # always valid JSON in json_object mode
print(data)  # → {"name": "Alice Smith", "age": 30, "email": "alice@example.com"}

Node.js equivalent:

import OpenAI from 'openai'

const client = new OpenAI()

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  response_format: { type: 'json_object' },
  messages: [
    {
      role: 'system',
      content: 'Extract entities as JSON: {"name": string, "age": number, "email": string}',
    },
    { role: 'user', content: 'Alice Smith, 30 years old, alice@example.com' },
  ],
})

const data = JSON.parse(response.choices[0].message.content!)

Structured Outputs — Schema-Validated JSON

Structured Outputs guarantee the response matches your JSON Schema exactly. Requires GPT-4o-2024-08-06 or later. The OpenAI SDK integrates with Pydantic (Python) and Zod (Node.js) to parse directly into typed models.

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

# Structured Outputs with Pydantic model (SDK parses automatically)
response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",  # structured outputs requires this model+
    response_format=CalendarEvent,
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are meeting on Friday March 15th."},
    ],
)

event = response.choices[0].message.parsed  # → CalendarEvent instance
print(event.name)          # "Meeting"
print(event.date)          # "2026-03-15"
print(event.participants)  # ["Alice", "Bob"]

// Node.js Structured Outputs with Zod
import OpenAI from 'openai'
import { zodResponseFormat } from 'openai/helpers/zod'
import { z } from 'zod'

const client = new OpenAI()

const CalendarEventSchema = z.object({
  name: z.string(),
  date: z.string(),
  participants: z.array(z.string()),
})

const response = await client.beta.chat.completions.parse({
  model: 'gpt-4o-2024-08-06',
  response_format: zodResponseFormat(CalendarEventSchema, 'calendar_event'),
  messages: [
    { role: 'system', content: 'Extract event information.' },
    { role: 'user', content: 'Alice and Bob meet on March 15.' },
  ],
})

const event = response.choices[0].message.parsed   // typed as z.infer<typeof CalendarEventSchema>

Function Calling (Tool Use)

Function calling lets the model generate a JSON arguments object to invoke a real function in your code. It is the foundation of agentic workflows where the model selects and calls tools across multiple steps.

import json
from openai import OpenAI

client = OpenAI()

# Define tools (functions the model can call)
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g., 'London'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"],
                "additionalProperties": False
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",  # or "required" to force tool use
)

message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    # → {"city": "Tokyo", "unit": "celsius"}

    # Execute the function with the extracted args
    weather_result = get_weather(args["city"], args.get("unit", "celsius"))

    # Continue conversation with tool result
    messages = [
        {"role": "user", "content": "What's the weather in Tokyo?"},
        message,  # assistant's tool_call message
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(weather_result),
        },
    ]
    final_response = client.chat.completions.create(model="gpt-4o", messages=messages)

JSON Schema for Structured Outputs

You can pass a raw JSON Schema directly without Pydantic or Zod. Use strict: True to enforce the most reliable schema adherence — all properties must be in required and additionalProperties: false must appear at every level.

# Manual JSON Schema for structured outputs (no Pydantic)
response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "product_extraction",
            "strict": True,   # strict mode: model must follow schema exactly
            "schema": {
                "type": "object",
                "properties": {
                    "products": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "price": {"type": "number"},
                                "in_stock": {"type": "boolean"},
                                "category": {
                                    "type": "string",
                                    "enum": ["electronics", "clothing", "food"]
                                }
                            },
                            "required": ["name", "price", "in_stock", "category"],
                            "additionalProperties": False
                        }
                    },
                    "total_count": {"type": "integer"}
                },
                "required": ["products", "total_count"],
                "additionalProperties": False
            }
        }
    },
    messages=[
        {"role": "user", "content": "Extract products from: iPhone 15 ($999, in stock, electronics), Levi jeans ($79, out of stock, clothing)"}
    ]
)

Note on strict: True: all schema properties must be listed in required, and additionalProperties: false must be set at every level.

Validate LLM JSON Output with Zod / Pydantic

Always validate — models can still hallucinate values even with Structured Outputs. Zod's safeParse() returns a result object rather than throwing, making it easy to handle validation failures gracefully.

// Always validate — models can still hallucinate values even with structured outputs
import { z } from 'zod'

const ProductSchema = z.object({
  name: z.string().min(1),
  price: z.number().positive(),
  in_stock: z.boolean(),
  category: z.enum(['electronics', 'clothing', 'food']),
})

type Product = z.infer<typeof ProductSchema>

async function extractProducts(text: string): Promise<Product[]> {
  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    response_format: { type: 'json_object' },
    messages: [
      { role: 'system', content: 'Extract products as JSON: {"products": [...]}' },
      { role: 'user', content: text },
    ],
  })

  const raw = JSON.parse(response.choices[0].message.content!)
  const result = z.object({ products: z.array(ProductSchema) }).safeParse(raw)

  if (!result.success) {
    console.error('Validation failed:', result.error.issues)
    return []
  }
  return result.data.products
}

Batch JSON Extraction at Scale

The OpenAI Batch API processes requests asynchronously at 50% of the standard price with a 24-hour completion window. Ideal for large dataset extraction.

# OpenAI Batch API — 50% cost reduction for async processing
import json
from pathlib import Path

# Prepare batch of requests
requests = []
texts = ["text 1...", "text 2...", "text 3..."]

for i, text in enumerate(texts):
    requests.append({
        "custom_id": f"request-{i}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "response_format": {"type": "json_object"},
            "messages": [
                {"role": "system", "content": "Extract data as JSON."},
                {"role": "user", "content": text},
            ],
        }
    })

# Write JSONL file
batch_file = Path("batch_requests.jsonl")
batch_file.write_text("\n".join(json.dumps(r) for r in requests))

# Upload and submit
with open(batch_file, "rb") as f:
    uploaded = client.files.create(file=f, purpose="batch")

batch = client.batches.create(
    input_file_id=uploaded.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")  # poll status with client.batches.retrieve(batch.id)

Comparison: JSON Mode vs Structured Outputs vs Function Calling

Feature	JSON Mode	Structured Outputs	Function Calling
Guarantee	Valid JSON	Schema adherence	Arguments match schema
Models	GPT-3.5-turbo-1106+	GPT-4o-2024-08-06+	GPT-3.5+
Schema source	Prompt (free-form)	JSON Schema in API	JSON Schema in tools
Strict mode	No	Yes	Yes (with strict: true)
Best for	Simple extraction	Data extraction, forms	Actions, multi-step agents
SDK helper	None needed	`beta.chat.completions.parse()`	Tool loop pattern
Cost overhead	~5-15%	~10-20 tokens/field	~50-100 tokens/tool

Definitions

JSON mode: An OpenAI chat completions setting (response_format: { type: "json_object" }) that guarantees the model's response can be parsed as JSON; does not guarantee schema conformance.
Structured Outputs: An OpenAI feature (response_format: { type: "json_schema" }) that guarantees the response conforms to a provided JSON Schema; available on GPT-4o-2024-08-06+.
function calling: An OpenAI capability where the model generates a JSON arguments object matching a declared function schema; enables agentic workflows where the model can invoke real functions.
strict mode: A Structured Outputs option (strict: true in the schema) that requires all properties to be listed in required and additionalProperties: false at every level; enables the most reliable schema adherence.
finish_reason: The field in a completion response indicating why the model stopped generating: "stop" (normal), "length" (hit max_tokens — JSON may be truncated), "tool_calls" (function calling), "content_filter" (content policy).

FAQ

What is OpenAI JSON mode?

JSON mode is enabled by setting response_format: { type: "json_object" }. It guarantees the model returns output that can be parsed as JSON — no markdown fences, no prose wrapping. You must mention "JSON" in the system or user message, or the API returns an error. It works with GPT-4o, GPT-4-turbo, and GPT-3.5-turbo-1106+. Important: JSON mode guarantees valid JSON but does NOT guarantee your specific schema — always validate with Zod or Pydantic to confirm the shape matches what you expect.

What is the difference between JSON mode and Structured Outputs?

JSON mode guarantees valid JSON in any shape — the model decides the structure. Structured Outputs guarantees the response matches your JSON Schema exactly, including required fields, types, and enum values. Structured Outputs requires GPT-4o-2024-08-06+ and strict: true in the schema. Use Structured Outputs for production data extraction; use JSON mode as a fallback for older models that do not support Structured Outputs.

How do function calling and JSON mode differ?

Function calling is designed for executing actions: the model generates a JSON arguments object so your code can call a real function. JSON mode is designed for data extraction: the model returns data as a JSON object. Function calling enables multi-step agentic workflows; JSON mode is one-shot extraction. You can combine them — define a tool whose parameters match the data shape you need, giving you function calling reliability with structured schema adherence.

Should I validate OpenAI JSON output even with Structured Outputs?

Yes — always validate. Structured Outputs guarantees schema adherence (correct types, required fields present, enum values respected) but cannot guarantee semantic correctness. The model may hallucinate values that match the declared type but are factually wrong: an incorrect city name, an off-by-magnitude price, a date in the wrong format. Use Zod's safeParse() or Pydantic's model_validate_json() for runtime type checks, and add business logic validation on top for semantic correctness.

How many tokens does JSON mode add to my request?

JSON mode itself adds no prompt tokens, but structured output typically increases output token count by 5–15% versus prose. Function calling tool definitions add ~50–100 tokens per tool to the prompt. Structured Outputs json_schema definitions add ~10–20 tokens per field defined. For high-volume workloads, use gpt-4o-mini (significantly cheaper) and the Batch API (50% cost reduction for async processing).

How do I handle OpenAI API errors when using JSON mode?

Check response.choices[0].finish_reason. A value of "length" means the model hit max_tokens before finishing — in JSON mode this produces truncated, unparseable JSON; increase max_tokens. A value of "content_filter" means the model was blocked by content policy. Always wrap JSON.parse() / json.loads() in try/catch. For 429 and 500 errors, retry with exponential backoff — the OpenAI SDK has built-in retry support via the max_retries parameter.

Can I use JSON mode with streaming?

Yes — stream: true works with JSON mode and Structured Outputs. Accumulate all chunks into a buffer and call JSON.parse() only after the stream closes (finish_reason === "stop"). Do not attempt to parse partial JSON from individual chunks — it will fail. Use streaming with JSON mode primarily to show a loading indicator while the model processes, not for incremental JSON consumption.

How do I use the OpenAI Batch API for JSON extraction?

Create a JSONL file where each line is a JSON object with custom_id, method, url, and body (the full chat completions request). Upload with client.files.create(purpose="batch"), submit with client.batches.create(), poll until status === "completed", then download results from output_file_id. The Batch API is 50% cheaper than synchronous calls with a 24-hour processing window — ideal for large dataset extraction where latency does not matter.