JSON in AI Prompts: Structured Outputs, Function Calling, and Response Parsing

Last updated:

JSON structured outputs from LLMs give you machine-readable responses with guaranteed field presence — eliminating regex hacks and brittle string parsing. OpenAI's response_format: { type: "json_schema" } enforces a JSON Schema on every response token, so the model physically cannot emit output that violates your schema. Anthropic Claude returns JSON when the prompt includes a JSON block describing the expected output and the assistant turn is prefilled with "{"— no dedicated structured output API is required. This guide covers OpenAI function calling, JSON mode vs. structured output mode, Anthropic JSON prompting patterns, parsing LLM JSON safely, and tool use with JSON schemas. Whether you're extracting entities, generating typed API payloads, or building agentic pipelines, reliable structured JSON output is the foundation. Use Jsonic's JSON formatter to inspect and validate any LLM JSON response during development.

OpenAI JSON Mode vs Structured Outputs

OpenAI provides two mechanisms for getting JSON from a model: JSON mode and structured outputs. JSON mode is enabled by setting response_format: { type: "json_object" }. It instructs the model to emit syntactically valid JSON, but imposes no constraints on which fields appear, what types they carry, or whether required keys are present. Empirically, JSON mode achieves around 95% reliability for consistent field presence — good enough for prototyping, not good enough for production pipelines where downstream code accesses fields by name without null checks.

Structured outputs, introduced by OpenAI in 2024, enforce a JSON Schema during generation via constrained decoding. Set response_format: { type: "json_schema", json_schema: { name: "...", schema: { ... }, strict: true } }. The model's token sampler is restricted so it can only emit tokens that keep the output on a valid path through your schema — guaranteeing 100% compliance. When the model would normally produce a refusal (e.g., harmful content), the response includes a refusal field instead of content; always check choices[0].message.refusal before accessing choices[0].message.content.

import OpenAI from 'openai'
const client = new OpenAI()

// JSON mode — valid JSON, no schema enforcement
const jsonModeResponse = await client.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },
  messages: [
    { role: 'system', content: 'Return JSON with fields: name (string), score (number).' },
    { role: 'user', content: 'Rate the movie Inception.' },
  ],
})
const loose = JSON.parse(jsonModeResponse.choices[0].message.content!)

// Structured outputs — schema-enforced, 100% compliant
const structuredResponse = await client.chat.completions.create({
  model: 'gpt-4o',
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'movie_rating',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string', description: 'Movie title' },
          score: { type: 'number', description: 'Rating 0-10' },
          summary: { type: 'string', description: 'One-sentence review' },
        },
        required: ['name', 'score', 'summary'],
        additionalProperties: false,
      },
    },
  },
  messages: [
    { role: 'user', content: 'Rate the movie Inception.' },
  ],
})

// Check for refusal before accessing content
const msg = structuredResponse.choices[0].message
if (msg.refusal) throw new Error(`Model refused: ${msg.refusal}`)
const strict = JSON.parse(msg.content!)
// strict.name, strict.score, strict.summary are guaranteed present

Use JSON mode when the model you need lacks structured output support, or when the schema is too complex for the constrained decoder (e.g., deeply recursive schemas). Use structured outputs in all other production cases. See JSON Schema patternsfor schema design strategies that work within OpenAI's supported subset.

OpenAI Function Calling with JSON Schemas

Function calling predates structured outputs and remains the standard mechanism for building agentic systems where the model selects which action to take. You define a tools array, each entry describing a callable function with a JSON Schema for its parameters. The model decides when to call a function, outputs a tool_calls array with structured JSON arguments, and you inject the result back as a tool role message for the next turn.

const tools = [
  {
    type: 'function' as const,
    function: {
      name: 'get_weather',
      description: 'Get current weather for a city',
      parameters: {
        type: 'object',
        properties: {
          city: { type: 'string', description: 'City name, e.g. "Portland"' },
          units: { type: 'string', enum: ['celsius', 'fahrenheit'] },
        },
        required: ['city', 'units'],
        additionalProperties: false,
      },
    },
  },
]

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  tools,
  // Force a specific tool: tool_choice: { type: 'function', function: { name: 'get_weather' } }
  // Allow parallel calls: parallel_tool_calls: true (default)
  messages: [{ role: 'user', content: 'What is the weather in Portland in celsius?' }],
})

const toolCall = response.choices[0].message.tool_calls?.[0]
if (toolCall) {
  const args = JSON.parse(toolCall.function.arguments)
  // args.city === 'Portland', args.units === 'celsius'

  const weatherResult = await fetchWeather(args.city, args.units)

  // Inject result back for the next model turn
  const followUp = await client.chat.completions.create({
    model: 'gpt-4o',
    tools,
    messages: [
      { role: 'user', content: 'What is the weather in Portland in celsius?' },
      response.choices[0].message,  // assistant message with tool_calls
      {
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(weatherResult),
      },
    ],
  })
  console.log(followUp.choices[0].message.content)
}

Token overhead is the main cost consideration: each function definition adds approximately 200 tokens to the prompt context. With 5 functions you add roughly 1,000 tokens per request. Minimize this by trimming function descriptions to one sentence, using short parameter names, and removing irrelevant functions from the tools array for turns where they cannot be called. For structured data extraction without tool execution, prefer structured outputs — function calling is best when the model genuinely needs to choose between actions.

Anthropic Claude JSON Prompting

Claude does not have a response_formatparameter equivalent to OpenAI's JSON mode. Instead, reliable JSON output is achieved through two complementary techniques: system prompt design and the prefill technique. The system prompt should include a concrete example of the expected JSON structure, field descriptions, and an explicit instruction such as “Respond only with valid JSON matching this schema, no explanation text.” Temperature should be set to 0 for extraction tasks — lower temperature reduces hallucinated field names by approximately 40%.

import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic()

// Technique 1: System prompt + prefill
const response = await anthropic.messages.create({
  model: 'claude-opus-4-5',
  max_tokens: 1024,
  temperature: 0,
  system: `You are a data extraction assistant. Always respond with valid JSON only.
Output schema:
{
  "sentiment": "positive" | "negative" | "neutral",
  "score": number between 0 and 1,
  "keywords": string[]
}
No explanation. No markdown fences. Only the JSON object.`,
  messages: [
    { role: 'user', content: 'Review: "The battery lasts all day and the camera is excellent."' },
    // Prefill: force Claude to start with {
    { role: 'assistant', content: '{' },
  ],
})

// Response content starts after the prefill — prepend { to reconstruct
const raw = '{' + response.content[0].text
const parsed = JSON.parse(raw)

// Technique 2: Tool use API (most reliable)
const toolResponse = await anthropic.messages.create({
  model: 'claude-opus-4-5',
  max_tokens: 1024,
  tools: [
    {
      name: 'extract_sentiment',
      description: 'Extract sentiment analysis from review text',
      input_schema: {
        type: 'object',
        properties: {
          sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
          score: { type: 'number', description: 'Confidence score 0-1' },
          keywords: { type: 'array', items: { type: 'string' } },
        },
        required: ['sentiment', 'score', 'keywords'],
      },
    },
  ],
  tool_choice: { type: 'tool', name: 'extract_sentiment' },
  messages: [
    { role: 'user', content: 'Review: "The battery lasts all day and the camera is excellent."' },
  ],
})

// Tool input is already a parsed object — no JSON.parse needed
const toolUse = toolResponse.content.find(b => b.type === 'tool_use')
if (toolUse && toolUse.type === 'tool_use') {
  const { sentiment, score, keywords } = toolUse.input as {
    sentiment: string; score: number; keywords: string[]
  }
}

The tool use API is the most reliable approach for Claude: it enforces the input_schema during generation, returns a structured object without needing JSON.parse, and eliminates the prefill string manipulation. The prefill technique works well for simpler schemas or when you need to avoid the tool use overhead. Never rely on asking Claude to “return JSON” in the user message alone without a system prompt — that yields inconsistent results.

Safe LLM JSON Parsing Patterns

Even with structured output APIs, safe parsing is non-negotiable. Network truncation, token limit overruns, and model regressions can all produce invalid JSON. A bare JSON.parse(llmOutput) call that throws crashes a Node.js server and propagates as an unhandled promise rejection in async contexts. The minimum viable safe pattern is a try/catch with schema validation after a successful parse. See the safe JSON parsing guide for the full TypeScript pattern including type narrowing.

import { z } from 'zod'
import { jsonrepair } from 'jsonrepair'

const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  score: z.number().min(0).max(1),
  keywords: z.array(z.string()),
})

type Sentiment = z.infer<typeof SentimentSchema>

async function parseLlmJson(rawText: string): Promise<Sentiment> {
  // Step 1: try direct parse
  let parsed: unknown
  try {
    parsed = JSON.parse(rawText)
  } catch {
    // Step 2: attempt repair for common syntax errors
    try {
      parsed = JSON.parse(jsonrepair(rawText))
    } catch {
      throw new Error(`Unrecoverable JSON syntax error: ${rawText.slice(0, 200)}`)
    }
  }

  // Step 3: schema validation — parse errors give typed field-level messages
  const result = SentimentSchema.safeParse(parsed)
  if (!result.success) {
    throw new Error(`Schema mismatch: ${result.error.message}`)
  }

  return result.data
}

// Retry strategy for production
async function parseLlmJsonWithRetry(
  rawText: string,
  retryFn: (error: string) => Promise<string>,
  maxRetries = 2
): Promise<Sentiment> {
  let text = rawText
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await parseLlmJson(text)
    } catch (e) {
      if (attempt === maxRetries) throw e
      // Send the error back to the model to fix
      text = await retryFn(`Fix this JSON to match the schema: ${e}

Original: ${text}`)
    }
  }
  throw new Error('Max retries exceeded')
}

Common LLM JSON errors and their fixes: trailing commas (use jsonrepair), markdown fences like ```json (strip with regex before parsing), single-quoted strings (use jsonrepair), and truncated output (check finish_reason — if it is length, increase max_tokens). Always log the raw LLM output before parsing in development so you can observe patterns in malformed responses. Validate against your JSON data validation schema immediately after parsing — never access nested fields from an unvalidated object.

Streaming JSON from LLMs

Streaming LLM responses reduces time-to-first-token and is essential for responsive UIs. When the response is JSON, streaming requires accumulating chunks before parsing — you cannot call JSON.parse on individual chunks because each chunk is a token fragment, not valid JSON. The OpenAI Node SDK provides a stream async iterable that yields delta objects; accumulate the content deltas into a string buffer, then parse once after the stream closes. See the streaming JSON guide for parser-level incremental techniques.

import OpenAI from 'openai'
const client = new OpenAI()

// Accumulate streaming JSON response
async function streamJsonResponse(): Promise<unknown> {
  const stream = await client.chat.completions.create({
    model: 'gpt-4o',
    stream: true,
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: 'Return JSON: { "items": [...], "total": number }',
      },
      { role: 'user', content: 'List 5 programming languages with their year created.' },
    ],
  })

  let buffer = ''
  let finishReason: string | null = null

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content ?? ''
    buffer += delta
    finishReason = chunk.choices[0]?.finish_reason ?? finishReason

    // Optional: show partial display to the user
    // (buffer may not be valid JSON yet — display as raw text or use a streaming parser)
    process.stdout.write(delta)
  }

  if (finishReason === 'length') {
    throw new Error('Response truncated — increase max_tokens')
  }

  return JSON.parse(buffer)
}

// Streaming JSON with partial object display using jsonparse
import { parser as JsonParser } from 'jsonparse'

async function streamWithIncrementalParse(onItem: (item: unknown) => void) {
  const parser = new JsonParser()
  parser.onValue = function (value) {
    // Fires when a complete nested value is parsed
    if (this.stack.length === 2) onItem(value) // top-level array items
  }

  const stream = await client.chat.completions.create({
    model: 'gpt-4o',
    stream: true,
    response_format: { type: 'json_object' },
    messages: [/* ... */],
  })

  for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content ?? ''
    if (text) parser.write(text)
  }
}

Detecting JSON completion during streaming: the finish_reason field transitions from null to stop on the final chunk when the model completed naturally, or length when it hit the token limit. Only parse the buffer after receiving a stop finish_reason. For partial object display before the stream completes, track the depth of { and } characters in the buffer to detect when top-level array elements are complete, then parse individual items.

JSON Schema Design for LLMs

Schema design directly affects both model compliance and output quality. The key principle: schemas for LLMs should be simpler and more explicit than schemas for data storage. LLMs use the description field as a prompt — it communicates intent when the field name alone is ambiguous. Keep schemas flat (maximum 3 nesting levels), use concrete enum values instead of open strings where possible, and treat every field as effectively required by expressing optionality via anyOf with null. See JSON Schema patterns for reference schema templates.

// ✅ Good schema for LLMs — flat, explicit descriptions, all fields required
const goodSchema = {
  type: 'object',
  properties: {
    sentiment: {
      type: 'string',
      enum: ['positive', 'negative', 'neutral', 'mixed'],
      description: 'Overall emotional tone of the text',
    },
    confidence: {
      type: 'number',
      description: 'Confidence score between 0.0 (low) and 1.0 (high)',
    },
    language: {
      type: 'string',
      description: 'ISO 639-1 language code, e.g. "en", "fr", "de"',
    },
    // Optional field expressed as anyOf with null — required by OpenAI structured outputs
    topic: {
      anyOf: [
        { type: 'string', description: 'Main topic if identifiable' },
        { type: 'null' },
      ],
    },
  },
  required: ['sentiment', 'confidence', 'language', 'topic'],
  additionalProperties: false,
}

// ❌ Problematic schema patterns for LLMs
const badSchema = {
  type: 'object',
  properties: {
    data: {
      // Too generic — model has no guidance on what to put here
      type: 'object',
      properties: {
        nested: {
          type: 'object',
          properties: {
            deepNested: {
              // 4 levels deep — increases hallucination risk
              type: 'object',
            },
          },
        },
      },
    },
    // oneOf with many variants — model struggles to pick correctly
    status: {
      oneOf: [
        { type: 'string', const: 'pending' },
        { type: 'string', const: 'processing' },
        { type: 'string', const: 'complete' },
        // ... 10 more variants — use enum instead
      ],
    },
  },
}

// ✅ Prefer enum over oneOf for fixed string sets
const fixedStatus = {
  type: 'string',
  enum: ['pending', 'processing', 'complete', 'failed', 'cancelled'],
  description: 'Current status of the job',
}

Additional schema design rules for LLMs: avoid pattern constraints (regex) — models rarely produce values that satisfy complex regex; instead, post- validate in application code. Do not use minimum/maximum with OpenAI structured outputs (not supported); validate ranges after parsing. Keep array items schemas simple — nested object arrays with many required fields cause the highest rate of schema violations. When in doubt, flatten the schema and use more fields at the top level.

Tool Use and Multi-Step JSON Pipelines

Multi-step agentic pipelines chain tool calls where the JSON output of one step becomes the input to the next. The model manages an internal plan, calls tools as needed, and updates its state through the conversation history. Each tool result is injected as a tool role message (OpenAI) or a tool_resultblock (Anthropic), and the accumulated JSON state is carried forward in the message array. The critical design concern is JSON state management between turns — the conversation history grows with each tool call, consuming tokens that crowd out the model's effective context.

// Multi-step pipeline: search → extract → summarize
interface PipelineState {
  query: string
  searchResults?: SearchResult[]
  extractedFacts?: string[]
  summary?: string
}

async function runPipeline(query: string): Promise<PipelineState> {
  const state: PipelineState = { query }

  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    {
      role: 'system',
      content: 'You are a research assistant. Use tools to answer the query step by step.',
    },
    { role: 'user', content: `Research: ${query}` },
  ]

  // Agentic loop — continue until model stops calling tools
  while (true) {
    const response = await client.chat.completions.create({
      model: 'gpt-4o',
      tools: [searchTool, extractTool, summarizeTool],
      messages,
    })

    const assistantMsg = response.choices[0].message
    messages.push(assistantMsg)

    if (!assistantMsg.tool_calls?.length) {
      // Model finished — no more tool calls
      break
    }

    // Execute all tool calls (possibly in parallel)
    const toolResults = await Promise.all(
      assistantMsg.tool_calls.map(async (call) => {
        const args = JSON.parse(call.function.arguments)
        let result: unknown

        switch (call.function.name) {
          case 'search':
            result = await search(args.query)
            state.searchResults = result as SearchResult[]
            break
          case 'extract_facts':
            result = await extractFacts(args.text)
            state.extractedFacts = result as string[]
            break
          case 'summarize':
            result = { summary: await summarize(args.facts) }
            state.summary = (result as { summary: string }).summary
            break
          default:
            result = { error: 'Unknown tool' }
        }

        return {
          role: 'tool' as const,
          tool_call_id: call.id,
          content: JSON.stringify(result),
        }
      })
    )

    messages.push(...toolResults)
  }

  return state
}

JSON state management best practices for multi-step pipelines: keep tool results compact — the model does not need the full raw result, only the fields it will use in subsequent steps. Summarize large intermediate results (e.g., truncate search result bodies to 200 characters) before injecting them as tool messages. Implement a maximum turn limit (e.g., 10 iterations) to prevent infinite loops caused by the model repeatedly calling the same tool. Log the full message array at each step during development — this is the most effective way to debug why a pipeline stalls or loops.

Definitions

Structured output
A mode of LLM response generation where the output is constrained to conform to a specific JSON Schema via constrained decoding. The model's token sampler only selects tokens that keep the output on a valid schema path, guaranteeing that required fields are present and types match. Supported by OpenAI via response_format: { type: "json_schema" }.
Function calling
An LLM capability where the model outputs a structured tool_callsarray instead of plain text when it determines that an external function should be invoked. Each tool call includes a function name and a JSON object of arguments conforming to the function's parameter schema. The caller executes the function and injects the result back into the conversation for the model's next turn.
Tool use
Anthropic's equivalent of OpenAI function calling. The model emits a tool_use content block containing a tool name and structured input object. The caller returns a tool_result block in the next user message. Tool use is the most reliable way to get structured JSON from Claude — the input_schema is enforced during generation.
JSON mode
An LLM response mode that guarantees the output is syntactically valid JSON but imposes no schema constraints on field names, types, or presence. Enabled with response_format: { type: "json_object" } in the OpenAI API. Achieves approximately 95% reliability for consistent field presence without schema enforcement.
Response format
The OpenAI API parameter that controls the format of the model's output. Valid values are text (default, plain string), json_object (JSON mode), and json_schema (structured outputs with schema enforcement). When set to json_schema, the json_schema sub-object must include a schema and optionally strict: true to enable constrained decoding.
Prefill
An Anthropic prompting technique where you provide the beginning of the assistant's response as the last message in the messages array with role assistant. The model continues generating from that prefix. Prefilling with "{"forces Claude to produce a JSON object. The prefill string must be prepended back to the model's response text before calling JSON.parse.
Schema enforcement
The mechanism by which an LLM runtime restricts token sampling to only those tokens that keep the output on a valid path through a JSON Schema. Implemented via constrained decoding or grammar-based sampling. Schema enforcement guarantees structural validity but not semantic correctness — the model may still produce plausible-sounding but factually wrong field values.

Frequently asked questions

What is the difference between JSON mode and structured outputs in OpenAI?

JSON mode (response_format: { type: "json_object" }) tells the model to return valid JSON but imposes no schema — the fields, types, and structure are unconstrained, giving roughly 95% reliability for consistent field presence. Structured outputs (response_format: { type: "json_schema", json_schema: { ... } }) enforce a JSON Schema on every response token during generation, guaranteeing 100% schema compliance. With structured outputs the model cannot emit a token that would violate the schema, so fields declared as required are always present and types never mismatch. Use JSON mode for quick prototyping; use structured outputs in production where downstream code depends on specific fields.

How do I force Claude to return JSON?

The most reliable technique is to include a JSON block in your system prompt describing the required output format, then prefill the assistant turn with an opening brace: set the last message in the messages array to { "role": "assistant", "content": "{" }. Claude will continue from that token, guaranteeing the response starts as a JSON object. Set temperature to 0 to reduce hallucinated field names by approximately 40%. Alternatively, use the Anthropic tool use API: define a tool whose input_schema describes your desired JSON structure, set tool_choice to { "type": "tool", "name": "your_tool" }, and read the tool input from the response — this is the most structured and reliable approach.

Why does my LLM sometimes return invalid JSON?

LLMs generate text token by token without a built-in JSON parser, so they can produce syntax errors such as trailing commas, unquoted keys, single-quoted strings, or truncated output when the response hits the token limit. JSON mode improves reliability but does not eliminate all errors — it only guarantees the top-level structure is an object or array, not that every value is correctly typed. Structured outputs (OpenAI) or tool use (Anthropic) eliminate syntax errors entirely. For models without structured output support, use a repair library like jsonrepair to fix common syntax errors before calling JSON.parse, and always wrap parsing in try/catch with a retry strategy.

How do I handle JSON parsing errors from LLM responses?

Wrap every JSON.parse call in a try/catch block — uncaught parse errors crash Node.js servers and throw unhandled promise rejections in browsers. A robust pattern: try direct parse, fall back to jsonrepair for syntax recovery, then validate the parsed object with zod or ajv before accessing any field. After parsing, validate the object shape with your schema before using any field — the model may have returned valid JSON that does not match your expected structure. For production systems, implement a retry loop that sends the invalid JSON back to the model with an error message asking it to fix the syntax. See the full pattern in the safe JSON parsing guide.

Can I stream JSON from OpenAI and parse it incrementally?

Yes. Use the OpenAI streaming API (stream: true) and accumulate chunks into a string buffer. You cannot call JSON.parse on each chunk individually — each chunk is a fragment, not valid JSON. Instead, accumulate all chunks into a full string and parse once after the stream completes. For incremental display, use a streaming JSON parser library such as jsonparse or oboe.js that emits events as each nested value is completed. The OpenAI Node SDK provides a stream helper: iterate with for await (const chunk of stream), append chunk.choices[0]?.delta?.content to a buffer, then call JSON.parse(buffer) after the loop exits.

What JSON Schema features does OpenAI structured output support?

OpenAI structured outputs support a subset of JSON Schema draft 2020-12. Supported keywords: type, properties, required, enum, array (with items), object, string, number, integer, boolean, null, anyOf, and $ref for recursive schemas. Not supported: oneOf with more than a small number of variants, $schema, not, if/then/else, pattern, and most string format validators. All object properties must be listed in properties, and additionalProperties must be false. Optional fields use anyOf: [{ type: "string" }, { type: "null" }] rather than being absent from required.

How many tokens does function calling add to my prompt?

Each function definition added to the tools array consumes approximately 200 tokens in the prompt, depending on the complexity of the function name, description, and parameter schema. A tools array with 5 functions adds roughly 1,000 tokens. The function call result injected back into the conversation (as a tool message) adds the token count of the JSON result itself. To minimize overhead, keep function descriptions to one sentence, use short parameter names, and remove irrelevant functions for turns where they cannot be called. Token overhead is the primary reason to prefer structured outputs over function calling when you only need structured data, not actual tool execution.

How do I validate LLM JSON output against a schema?

After parsing the LLM response with JSON.parse, validate the resulting object with a schema validation library. With zod: const result = MySchema.safeParse(parsed); — if result.success is false, access result.error.issues for field-level error messages. With ajv: compile the JSON Schema once at startup and call validate(parsed) per response. Zod is preferred for TypeScript projects because it infers static types from the schema automatically, giving you a typed value without manual casting. Always validate before accessing nested fields — even with structured outputs enabled, validating after parsing gives you typed, narrowed values rather than unknown. See the JSON data validation guide for a comparison of zod, ajv, and other validation libraries.

Further reading and primary sources

Inspect and validate LLM JSON responses visually

Paste any LLM structured output into Jsonic to pretty-print, navigate nested fields, and spot schema mismatches instantly. Validate your JSON before it hits production parsing code.

Open JSON Formatter