Generate JSON Schema from JSON: Tools, Inference Rules, and Tradeoffs
Last updated:
You have a JSON example. You want a JSON Schema that validates documents shaped like it. The shortcut is inference — feed the example to a tool and let it emit a schema. The tradeoff is that inferred schemas are mechanical: they capture structure but miss intent (which fields are truly required, which strings are enums, what numeric ranges are valid). Four tool categories handle this conversion: online converters, CLI tools (quicktype), language libraries (genson for Python, to-json-schema for Node.js), and AI-assisted generators. None of them produce a finished schema. Inferred schemas are always too permissive or too strict — treat them as a starting point, not the final schema.
Need to validate a JSON document against a schema right now? Use Jsonic's JSON Schema Validator — paste both the schema and the JSON to get instant pass/fail with the exact error path.
Validate against a schemaInference rules: what tools can and can't infer from a single example
Every inferrer applies the same handful of rules to map JSON values into JSON Schema constructs. The rules are mechanical — they cannot guess at intent. The table below shows what is recoverable from a single example, what needs multiple examples, and what only a human can decide.
| JSON construct | Inferred schema | Recoverable from 1 example? | Needs human input |
|---|---|---|---|
String "hello" | {"type": "string"} | Yes | format, pattern, enum, minLength/maxLength |
Integer 42 | {"type": "integer"} | Yes | minimum, maximum, multipleOf |
Float 3.14 | {"type": "number"} | Yes | minimum, maximum |
Boolean true | {"type": "boolean"} | Yes | — |
Null null | {"type": "null"} | Yes | Whether nullable for this field overall |
Array [1, 2, 3] | {"type": "array", "items": {"type": "integer"}} | Yes (element type) | minItems, maxItems, uniqueItems, tuple validation |
Object {"a": 1} | properties + (sometimes) required | Structure only | Which fields are truly required vs optional |
Empty array [] | {"type": "array"} (no items) | No — element type unknown | Element schema |
Heterogeneous array [1, "a"] | {"items": {"type": ["integer", "string"]}} | Yes (union of seen types) | Whether union is intentional or a bug |
Two failure modes follow from this table. Too narrow: a field that happened to be a non-null string in the example becomes {"type": "string"}, even though the API allows null — production data fails validation. Too loose: a status field that should be one of three values becomes{"type": "string"} with no enum — invalid statuses pass validation silently. Both failures are unavoidable from inference alone.
Online tools: paste JSON, get a schema
Browser tools are the fastest path for one-off conversions. Paste the example, copy the generated schema, refine by hand. No install, no setup. Four reliable options:
- Jsonic's JSON Schema Validator — Jsonic doesn't ship a dedicated schema generator, but you can paste a candidate schema into the JSON Schema Validator to confirm it accepts your example. Pair with one of the generators below.
- jsonschema.net — paste JSON in the left pane, get the schema in the right. Supports drafts 4, 6, 7, and 2019-09. Lets you tick which fields are required in the UI before exporting, which is the closest any free tool gets to a guided refinement flow.
- quicktype.io — the web UI for the quicktype project. Pick "JSON Schema" as the output language, paste one or more examples, get a unioned schema. Same engine as the CLI, so output matches what you'd get locally.
- transform.tools/json-to-json-schema — minimal, fast, no settings. Outputs a draft-07 schema with every field marked required. Best for the literal shortest-path use case where you'll refine the output afterwards anyway.
Limitations of every online tool: single-example only (or you paste them sequentially and lose the merge), no enum or format detection, no way to express constraints that aren't in the input shape. Use them for the first 60% — then move to a refinement step.
CLI tool: quicktype
quicktype is the most capable single-binary generator. It accepts JSON, JSON Schema, GraphQL, or TypeScript and emits any of ~20 target languages plus JSON Schema itself. For schema generation it handles multiple input files, unions their types, and emits draft-06 by default.
# install
npm install -g quicktype
# single example → JSON Schema
quicktype -s json -l schema -o schema.json example.json
# multiple examples → unioned schema (recommended)
quicktype -s json -l schema -o schema.json \
examples/user-1.json examples/user-2.json examples/user-3.json
# infer enums from repeated string values across examples
quicktype -s json -l schema --infer-enums -o schema.json examples/*.json
# infer string formats (date-time, uuid, integer-as-string)
quicktype -s json -l schema --infer-date-times --infer-uuids -o schema.json examples/*.jsonUseful flags worth knowing: --all-properties-optional flips the default "everything required" stance; --infer-maps treats objects with uniform value types as additionalProperties maps instead of fixed property sets;--top-level Name sets the root schema title.
The same command also produces TypeScript, Go, Rust, Python, and other types — see the FAQ below on TypeScript generation. Keeping schema + types generated from the same source is the main reason teams pick quicktype over plain library inference.
Python library: genson
genson is a Python library purpose-built for multi-example inference. ItsSchemaBuilder incrementally merges examples and produces a schema that correctly marks fields optional when they're missing from any example. This is the gold standard for programmatic generation when you have a corpus of real data.
# install
pip install genson
# build from multiple examples
from genson import SchemaBuilder
import json
from pathlib import Path
builder = SchemaBuilder()
for path in Path("examples").glob("*.json"):
with open(path) as f:
builder.add_object(json.load(f))
# merge in an existing schema if you have one
builder.add_schema({"type": "object", "properties": {"id": {"type": "string"}}})
schema = builder.to_schema()
print(json.dumps(schema, indent=2))SchemaBuilder defaults: draft-06, fields present in all examples become required, fields present in some examples are optional, mixed types become"type": ["string", "null"] unions. Pass schema_uri=None to omit the $schema header if you'll set it later.
genson also exposes a CLI: genson examples/*.json > schema.json. The library form is preferred when you need to merge with an existing schema or apply post-processing (e.g., walk the tree adding format: date-time wherever a property name ends in _at).
Node.js libraries: to-json-schema and json-schema-generator
Two competing Node libraries cover the JavaScript ecosystem. Both work from a single JSON value per call; merging multiple examples is your responsibility.
// to-json-schema — small, opinionated
import toJsonSchema from 'to-json-schema'
const example = {
id: 42,
name: 'Alice',
tags: ['admin', 'beta'],
profile: { email: 'alice@example.com', verified: true },
}
const schema = toJsonSchema(example, {
required: true, // mark every field required (default: false)
arrays: { mode: 'first' },// infer items from first array element only
strings: { detectFormat: true }, // detect date-time, email, ipv4, etc.
})
console.log(JSON.stringify(schema, null, 2))// json-schema-generator — older, draft-04 only
import generator from 'json-schema-generator'
const schema = generator(example)
// Output: every field required, draft-04, no format detectionto-json-schema is the better-maintained option. Its detectFormat flag is the only Node inferrer that emits format hints automatically — it spots date-time, date, email, ipv4,ipv6, and uri strings. To merge multiple examples, generate a schema per example and use a schema-merger library like json-schema-merge-allof with allOf wrapping — or just use quicktype's multi-file mode, which handles the merge correctly.
Refining the inferred schema: required, enum, minimum/maximum, patterns
Every generated schema needs a refinement pass. The five additions below cover ~95% of the gap between inference output and a production-grade schema.
- Trim
requiredarrays to actually-required fields. The producer of the data knows which fields it guarantees. Inferrers either mark everything required (overfit) or nothing required (useless). Replace with the real set from API docs or contract tests. - Add
enumfor closed sets. Status fields, role fields, country codes, environment names — anywhere the value comes from a known finite list. Example:"status": {"type": "string", "enum": ["pending", "active", "archived"]}. - Add
formatfor known string shapes. JSON Schema supportsdate-time,date,time,email,uri,uuid,ipv4,ipv6,hostname, andregex. Validators like Ajv enforce them when configured. - Add numeric bounds.
minimum,maximum,exclusiveMinimum,exclusiveMaximum,multipleOf. Quantities, percentages, ports, timeouts — they all have valid ranges that inference can never recover. - Add
patternfor structured strings. SKUs, slugs, codes with custom prefixes. Inferrers never emit regex patterns; you write them by hand.
For repeated refinements across many schemas, write the post-processing as a script that walks the tree and applies rules (e.g., "any property name ending in _at gets format: date-time"). Treat this script as code you maintain — it encodes your team's domain conventions.
Inferring from multiple examples: handling optional fields and union types
Going from one example to many is the single largest quality jump available without human input. The merge logic that handles this is non-trivial — the table below shows the cases every multi-example inferrer (genson, quicktype) handles correctly.
| Pattern across examples | Inferrer output | What it means |
|---|---|---|
| Field present in all examples | Listed in required | Producer always emits this field |
| Field present in some, missing in others | In properties but NOT required | Optional field |
| Field is sometimes null, sometimes string | "type": ["string", "null"] | Nullable string |
| Field is sometimes int, sometimes float | "type": "number" (covers both) | Numeric, fractional allowed |
| Array elements have varying shapes | "items": {"anyOf": [...]} | Heterogeneous array — usually a bug worth catching |
String field has small fixed value set (with --infer-enums) | enum with all observed values | Likely a closed set — verify completeness |
How many examples is enough? Empirically: 5 examples catch ~70% of optional fields, 10 catch ~85%, 30 catch ~95%. Diminishing returns set in fast. The most valuable examples are not random — they're the ones covering known edge cases: a response with an empty array, one with all optional fields populated, one with null values, one that hit an error path. Curate the example set; don't just dump 1000 random production payloads.
For ongoing schema evolution, integrate inference into a test: every time a new example shape appears in production logs, run the inferrer and diff against the committed schema. A diff means the contract changed — review it intentionally.
Key terms
- JSON Schema
- A vocabulary for describing the structure of JSON documents. Defined by the JSON Schema project; the current draft is 2020-12. Used for validation, documentation, code generation, and API contracts. Files use the
.jsonextension and contain a single JSON object with a$schemaheader identifying the draft. - Inference
- The process of generating a schema from one or more example documents by inspecting their structure and types. Inference is mechanical: it captures shape but cannot recover intent like "this field is optional in production even though it's present in this example."
- Draft version
- JSON Schema has gone through several revisions: draft-04, draft-06, draft-07, 2019-09, and 2020-12. Each draft changes keyword names and semantics (e.g.,
definitions→$defs,dependenciessplit intodependentRequiredanddependentSchemas). Always declare the draft via the$schemafield so validators interpret your schema correctly. - Required field
- A field listed in an object's
requiredarray — the document is invalid if the field is missing. JSON Schema'srequiredapplies per-object, not globally, so different objects can require different fields. - Enum
- A constraint that limits a value to one of a fixed list:
{"enum": ["red", "green", "blue"]}. Applies to any JSON type, not just strings. Inferrers don't produce enums from a single example; multi-example tools with explicit flags (quicktype's--infer-enums) emit them when a string field shows a small fixed set across the corpus. - Type union
- A schema that accepts multiple JSON types:
"type": ["string", "null"]or{"anyOf": [...]}. Multi-example inferrers produce type unions when a field appears as different types across examples. Often legitimate (nullable strings); sometimes a bug worth investigating.
Frequently asked questions
Can I generate JSON Schema from a single JSON example?
Yes — every tool covered here works from a single example, but the result will be wrong in predictable ways. A single example tells the inferrer which fields exist, what JSON type each value has, and what the array element type looks like. It cannot tell the inferrer which fields are optional (every present field looks required), which numeric ranges are valid, which strings are enums, what string format applies (date vs UUID vs email), or whether nulls are allowed. The output is a "narrowest schema that accepts this exact example" — useful as a 60-70% starting point, but always too strict for real data. Plan to feed multiple examples, or refine by hand, before using the schema for validation.
What JSON Schema draft does quicktype output?
quicktype emits JSON Schema draft-06 by default as of the current release. The output uses the $schema keyword "http://json-schema.org/draft-06/schema#" and relies on definitions (not $defs) for shared subschemas. If you need a newer draft — 2019-09 or 2020-12 — you have to hand-edit the $schema URL and replace definitions with $defs, plus update any keyword that changed (e.g., dependencies split into dependentRequired and dependentSchemas in 2019-09). For most validation libraries (Ajv, jsonschema, networknt) draft-06 is fine and forward-compatible. Check the tool you target before assuming the latest draft works.
How do I make all fields required in the generated schema?
Most inferrers default to one of two extremes: every present field becomes required (genson, quicktype) or no fields are required (to-json-schema with default options). To force "all required", use genson with its default behavior and a single example, or pass {"required": true} to to-json-schema. To force "all optional", post-process by deleting every required array in the output. The realistic answer is neither extreme is correct — you want explicit required arrays per object, populated from your domain knowledge of which fields the API actually guarantees. Treat the auto-generated required arrays as a draft and trim them to the fields your producer truly always emits.
Does the inferred schema include enum or pattern constraints?
No — pure inference from JSON examples cannot produce enum or pattern constraints from a single example, because every value would become a one-element enum, which is useless. Some tools (quicktype with --infer-enums) will scan multiple examples and emit an enum if a string field takes only a small fixed set of values across the corpus. None of the tools infer regex patterns from strings; pattern, format, minLength, and maxLength all have to be added by hand. The same applies to numeric bounds (minimum, maximum, multipleOf): inferrers know the type is "number" but never the valid range. If you need these constraints, plan a refinement pass after generation.
Should I commit the generated schema to my repo?
Yes — but commit the refined schema, not the raw inferrer output. The refined schema is a source-of-truth artifact: it defines the API contract, gets reviewed in PRs, and drives validation in production. Generating it on every build from a sample file is fragile because changes to the sample silently change the contract. The recommended workflow is: keep your sample JSON files alongside the schema, run the generator manually when you intentionally evolve the contract, hand-edit required, enum, format, and bounds, then commit the schema file. Treat schema diffs as semantic-versioning breaking changes for consumers who depend on the contract.
How do I generate JSON Schema from multiple example files?
genson and quicktype both support multi-example input directly. With genson (Python): create a SchemaBuilder, call builder.add_object() for each example, then call builder.to_schema(). Fields present in some examples but missing from others are correctly marked optional. With quicktype: pass multiple files or pipe a JSON array of examples — quicktype unions the inferred types across all examples and emits a "anyOf" or widened union where the types differ. Multi-example inference is the single biggest quality jump available without manual work: 5–10 realistic examples typically catch 80% of the optional/required distinctions a single example misses.
What's the difference between to-json-schema and genson?
Both infer a JSON Schema from JSON examples but differ in language, defaults, and merge strategy. to-json-schema is a Node.js library — small, synchronous, no external dependencies. By default it marks every field as not required and outputs draft-04. It accepts a single JSON value per call; merging multiple examples is your job. genson is a Python library with the SchemaBuilder pattern: you add multiple example objects and it incrementally merges them, correctly producing optional/required distinctions, union types, and array element unions. genson outputs draft-06 by default and is the better choice when you have 5+ examples. Use to-json-schema for quick Node-only scripts; use genson when schema quality matters.
Can I generate TypeScript types from the same JSON?
Yes — quicktype is built for exactly this use case. The same input JSON (or JSON Schema) can produce TypeScript interfaces, Go structs, Rust structs, Python dataclasses, Kotlin data classes, Swift structs, and ~15 other targets. The CLI is: quicktype -s json -o Types.ts example.json. If you already have a JSON Schema, use -s schema instead. The output includes parse/serialize helpers in some languages. For a TypeScript-only flow you can also use libraries like json-schema-to-typescript (which goes Schema → TS) or ts-json-schema-generator (which goes the other way: TS → Schema). Decide which artifact is your source of truth — the schema or the types — and generate the other from it.
Further reading and primary sources
- JSON Schema — Official specification — The authoritative spec, draft history, and reference vocabularies
- quicktype.io — Multi-target generator: JSON → JSON Schema, TypeScript, Go, Rust, and ~17 others
- genson on GitHub — Python library for multi-example schema inference with SchemaBuilder
- JSON Schema Tool — Browser-based generator with required-field toggles and multi-draft output
- Understanding JSON Schema — Long-form book covering every keyword with worked examples