JSON to Pydantic: Parse and Validate JSON with Python Models
Pydantic is the most widely used Python data validation library, downloaded over 100 million times per week on PyPI. It parses JSON directly into type-annotated Python classes with automatic coercion and validation. Pydantic v2, released in 2023, uses a Rust core (pydantic-core) that makes parsing 5–50× faster than v1. The key method is model_validate_json(json_string), which parses JSON and validates in one step without calling json.loads() separately. A Pydantic BaseModel subclass defines fields as annotated class attributes: name: str, age: int, tags: list[str]. Optional[str] or str | None marks nullable fields; Field(default=...) sets defaults and metadata. Pydantic v2 also generates JSON Schema from your models with Model.model_json_schema(), making it compatible with FastAPI JSON, LLM structured outputs, and OpenAPI documentation. This guide covers model definition, JSON parsing, nested models, validation errors, and JSON Schema export.
Need to inspect or pretty-print a JSON payload before parsing it with Pydantic? Jsonic's formatter validates and formats JSON instantly.
Open JSON FormatterDefine a Pydantic Model
Subclassing BaseModel and annotating fields with Python types is the complete recipe for a Pydantic model — no decorator, no schema registration, no metaclass boilerplate. Pydantic reads the annotations at class creation time and builds a compiled validator for each field. The 5 basic scalar types cover the vast majority of JSON fields: str, int, float, bool, and None. Container types follow standard Python generics: list[str] for a JSON array of strings, dict[str, int] for a JSON object with integer values.
Optional[str] is identical to str | None — both tell Pydantic the field may be null in JSON or absent entirely if a default is provided. The Field() function adds metadata: default, default_factory, description, examples, min_length, max_length, ge (greater-than-or-equal), and le. These constraints are enforced at parse time and also appear in the generated JSON Schema. model_config = ConfigDict(strict=True) switches to strict mode: Pydantic will no longer coerce "123" to 123 — the field must already be the correct type. The following User model demonstrates all 5 patterns in one class:
from pydantic import BaseModel, Field, ConfigDict
from typing import Optional
class User(BaseModel):
model_config = ConfigDict(strict=False) # coercion on (default)
id: int # required int
name: str # required string
email: str # required string
bio: Optional[str] = None # nullable, defaults to None
tags: list[str] = Field(default_factory=list, description="User labels")
# All 5 fields present
user = User(id=1, name="Alice", email="alice@example.com", bio="Engineer", tags=["admin"])
print(user.id) # 1
print(user.tags) # ['admin']
# bio and tags omitted — use defaults
user2 = User(id=2, name="Bob", email="bob@example.com")
print(user2.bio) # None
print(user2.tags) # []Pydantic coerces compatible types by default: a JSON string "42" in an int field is accepted and converted to 42. With ConfigDict(strict=True), that same input raises a ValidationError. For production APIs receiving untrusted JSON, strict mode is the safer choice — it forces upstream producers to send correctly typed values rather than relying on silent coercion. Use validate JSON Schema in Python if you need to validate JSON against an external schema before constructing a Pydantic model.
Parse JSON to a Pydantic Model
Pydantic v2 provides 2 parsing entry points: model_validate_json(json_string) for raw JSON strings and model_validate(dict_obj) for Python dicts. Prefer model_validate_json when the input is a JSON string — it is ~20% faster than calling model_validate(json.loads(s)) because the Rust parser inside pydantic-core handles both decoding and validation in a single pass, skipping the Python-level json module entirely.
For a JSON array of objects at the top level (not nested under a key), use TypeAdapter. TypeAdapter(list[User]).validate_json(json_str) parses the array and validates each element as a User instance in one call. The parse JSON in Python guide covers the standard json.loads() approach for comparison. The following example shows all 3 patterns:
import json
from pydantic import BaseModel, TypeAdapter
class User(BaseModel):
id: int
name: str
email: str
# ── Pattern 1: model_validate_json (preferred for JSON strings) ──────────
json_str = '{"id": 1, "name": "Alice", "email": "alice@example.com"}'
user = User.model_validate_json(json_str)
print(user.name) # Alice
# ── Pattern 2: model_validate (for Python dicts) ─────────────────────────
data_dict = {"id": 2, "name": "Bob", "email": "bob@example.com"}
user2 = User.model_validate(data_dict)
# ── Pattern 3: validate a JSON array with TypeAdapter ────────────────────
json_array = '[{"id":1,"name":"Alice","email":"a@x.com"},{"id":2,"name":"Bob","email":"b@x.com"}]'
adapter = TypeAdapter(list[User])
users = adapter.validate_json(json_array)
print(len(users)) # 2
print(users[0].name) # Alicemodel_validate_json raises ValidationError for both invalid JSON syntax and type mismatches — a single except ValidationError block handles both. model_validate raises ValidationError only for type mismatches (the dict is already parsed). For HTTP APIs built with FastAPI JSON, you never call these methods directly — FastAPI calls them automatically for request body parameters annotated with a Pydantic model type.
Nested Models and Lists
Pydantic handles nested JSON objects by using one BaseModel subclass as the type annotation of another model's field. When Pydantic encounters a nested dict in the JSON, it recursively validates it against the nested model's schema. This works to any depth — nested models can themselves contain nested models. items: list[Item] handles a JSON array of objects: Pydantic validates each element against the Item model and returns a Python list of Item instances.
The following example shows an Order model containing a nested Customer object and a list[LineItem]. Note the 3 levels of nesting: 1 order → 1 customer + N line items → each line item has a product name and quantity. model.model_dump() serializes the entire nested structure back to a Python dict; model.model_dump_json() returns a JSON string directly.
from pydantic import BaseModel
from typing import Optional
class Customer(BaseModel):
id: int
name: str
email: Optional[str] = None
class LineItem(BaseModel):
product: str
quantity: int
unit_price: float
class Order(BaseModel):
order_id: str
customer: Customer # nested model
items: list[LineItem] # array of nested models
discount: float = 0.0
json_str = '''
{
"order_id": "ORD-001",
"customer": {"id": 42, "name": "Alice", "email": "alice@example.com"},
"items": [
{"product": "Widget", "quantity": 3, "unit_price": 9.99},
{"product": "Gadget", "quantity": 1, "unit_price": 24.99}
],
"discount": 0.1
}
'''
order = Order.model_validate_json(json_str)
print(order.customer.name) # Alice
print(order.items[0].product) # Widget
print(len(order.items)) # 2
# Serialize back to dict
d = order.model_dump()
print(type(d['customer'])) # <class 'dict'>
# Serialize to JSON string
json_out = order.model_dump_json(indent=2)
print(json_out)model_dump() accepts keyword arguments to control output: exclude_none=True drops fields with None values, exclude_unset=True drops fields that were not explicitly set (useful for PATCH payloads), and mode='json' converts all values to JSON-safe Python types (e.g., datetime → ISO string). For large APIs with many nested models, use JSON Schema generation to document and validate the full contract.
Handling Validation Errors
When JSON is invalid JSON syntax or when fields fail type/constraint checks, Pydantic raises ValidationError. A single except ValidationError block catches all validation failures — there is no separate exception for syntax errors vs. type errors. Pydantic collects all errors before raising, not just the first one: a JSON object with 3 invalid fields raises 1 ValidationError containing 3 error entries.
e.errors() returns a list of dicts, each with 4 keys: loc (tuple of field names forming the path, e.g. ('customer', 'email')), msg (human-readable message), type (machine-readable error code like 'string_type' or 'int_parsing'), and input (the actual bad value). Use e.error_count() to get the total number of errors without iterating the list.
from pydantic import BaseModel, ValidationError, Field
class Product(BaseModel):
name: str
price: float = Field(gt=0) # must be > 0
stock: int = Field(ge=0) # must be >= 0
# ── Catch ValidationError ─────────────────────────────────────────────────
bad_json = '{"name": 123, "price": -5.0, "stock": "not-a-number"}'
try:
product = Product.model_validate_json(bad_json)
except ValidationError as e:
print(f"{e.error_count()} errors") # 3 errors
for err in e.errors():
print(err['loc'], err['type'], err['input'])
# ('name',) string_type 123
# ('price',) greater_than -5.0
# ('stock',) int_parsing 'not-a-number'
# ── API error response pattern ────────────────────────────────────────────
def parse_product(json_str: str) -> dict:
try:
return Product.model_validate_json(json_str).model_dump()
except ValidationError as e:
return {
"error": "Validation failed",
"details": [
{"field": ".".join(str(x) for x in err["loc"]),
"message": err["msg"]}
for err in e.errors()
]
}
# ── model_construct: skip validation (trusted data only) ─────────────────
trusted = Product.model_construct(name="Widget", price=9.99, stock=100)
print(trusted.name) # Widget — no validation ranmodel_construct() builds a model instance without running any validation. Use it only with trusted data (e.g., values read directly from a database after a schema migration). Never use model_construct() with user-supplied input — invalid values will silently pass through and cause runtime errors elsewhere. For structured error responses in HTTP APIs, map e.errors() to your API's error format (field path + message) and return a 422 Unprocessable Entity status code, which is the HTTP standard for validation failures.
Generate JSON Schema from Pydantic
User.model_json_schema() returns a Python dict representing a JSON Schema Draft 2020-12 compatible schema for the model. The schema includes all field types, constraints from Field(), descriptions, and examples. This single method drives 3 major use cases: OpenAPI documentation in FastAPI, structured output enforcement in LLM APIs, and client-side or external validation via JSON Schema validators.
json.dumps(User.model_json_schema(), indent=2) prints the schema in readable form. For LLM structured outputs, pass the schema dict to OpenAI's response_format={"type":"json_schema","json_schema":{"schema":...}}. For recursive models (a TreeNode that contains a list[TreeNode]), call Model.model_rebuild() before callingmodel_json_schema() to resolve forward references. The JSON Schema guide explains Draft 2020-12 keywords in depth.
import json
from pydantic import BaseModel, Field
from typing import Optional
class Address(BaseModel):
street: str = Field(description="Street address line")
city: str
country: str = Field(default="US", examples=["US", "CA", "GB"])
class User(BaseModel):
id: int = Field(description="Unique user identifier", ge=1)
name: str = Field(min_length=1, max_length=100)
email: str = Field(description="Primary email address")
address: Optional[Address] = None
tags: list[str] = Field(default_factory=list, max_length=10)
# ── Generate JSON Schema ──────────────────────────────────────────────────
schema = User.model_json_schema()
print(json.dumps(schema, indent=2))
# {
# "title": "User",
# "type": "object",
# "properties": {
# "id": {"type": "integer", "minimum": 1, "description": "Unique user identifier"},
# "name": {"type": "string", "minLength": 1, "maxLength": 100},
# ...
# },
# "$defs": {"Address": {"type": "object", "properties": {...}}}
# }
# ── LLM structured output (OpenAI) ───────────────────────────────────────
# import openai
# response = openai.chat.completions.create(
# model="gpt-4o",
# messages=[{"role": "user", "content": "Return a user JSON"}],
# response_format={"type": "json_schema", "json_schema": {"schema": schema}}
# )
# ── Recursive model: call model_rebuild() first ───────────────────────────
class TreeNode(BaseModel):
value: int
children: list['TreeNode'] = []
TreeNode.model_rebuild() # resolve forward reference
tree_schema = TreeNode.model_json_schema()
print(tree_schema['title']) # TreeNodeFastAPI automatically calls model_json_schema() on every Pydantic model used as a request body or response type, merging the result into the OpenAPI spec served at /openapi.json. You get interactive API docs at /docs (Swagger UI) and /redoc (ReDoc) with no extra configuration. Use Jsonic's JSON formatter to inspect and validate the generated schema output during development.
Key terms
- BaseModel
- The Pydantic base class that all data models subclass; it provides
model_validate(),model_validate_json(),model_dump(), andmodel_json_schema()as class and instance methods. - model_validate_json()
- A class method that accepts a raw JSON string, parses it using pydantic-core's Rust JSON parser, and validates the result against the model's field types and constraints in a single step — no intermediate
json.loads()call is needed. - ValidationError
- The exception Pydantic raises when one or more fields fail type or constraint checks during parsing; it collects all errors before raising and exposes them via
.errors()as a list of dicts withloc,msg,type, andinputkeys. - Field()
- A Pydantic function used as the default value of a model attribute to attach metadata and constraints (e.g.,
default,default_factory,description,ge,le,min_length) that are enforced at parse time and exported to JSON Schema. - TypeAdapter
- A Pydantic v2 utility class that applies validation and serialization to arbitrary Python types (including generics like
list[MyModel]ordict[str, int]) without requiring aBaseModelwrapper — used primarily to validate top-level JSON arrays. - model_json_schema()
- A class method that introspects the model's field annotations and
Field()metadata to produce a JSON Schema Draft 2020-12 compatible Python dict, which is used by FastAPI for OpenAPI docs and by LLM APIs for structured output enforcement. - pydantic-core
- The Rust-based validation engine introduced in Pydantic v2 that compiles model validators at class creation time, enabling 5–50× faster JSON parsing and validation compared to the pure-Python implementation in Pydantic v1.
Frequently asked questions
What is the difference between model_validate and model_validate_json in Pydantic v2?
model_validate(obj) accepts a Python dict (already parsed). model_validate_json(json_str) accepts a JSON string and parses it internally using the faster Rust JSON parser — no need to call json.loads() first. model_validate_json is ~20% faster for JSON string input because it skips the Python-level json.loads() call and goes directly through pydantic-core's built-in Rust parser. Use model_validate when you already have a Python dict (e.g., from a database ORM or another library). Use model_validate_json when you receive a raw JSON string from an HTTP request body, a file, or a message queue. Both methods raise ValidationError if the data does not conform to the model's field types and constraints. When building with FastAPI, the framework calls these methods automatically — you declare the Pydantic type and FastAPI handles parsing.
How do I make a Pydantic field optional with a default value?
Use field_name: str | None = None for an optional field defaulting to None, or field_name: str = "default" for a non-null default. For complex defaults (lists, dicts), use Field(default_factory=list) to avoid the mutable default gotcha — Python shares the same list object across all instances if you write items: list = [] directly. For metadata alongside the default, combine both: field_name: str | None = Field(default=None, description="User bio", max_length=500). In Pydantic v2, Optional[str] is equivalent to str | None (both allow the value to be None), but you must still supply a default if you want the field to be truly optional at parse time — otherwise Pydantic requires the key to be present in the JSON even if the value can be null.
How does Pydantic v2 differ from Pydantic v1?
Pydantic v2 uses pydantic-core (Rust) making it 5–50× faster than v1. Key breaking changes: model_validate() replaces parse_obj(), model_validate_json() replaces parse_raw(), model_dump() replaces dict(), and model_dump_json() replaces json(). ConfigDict replaces the inner Config class — for example, class Config: orm_mode = True becomes model_config = ConfigDict(from_attributes=True). Validators use the @field_validator and @model_validator decorators with different signatures. Most v1 code works in v2's compatibility shim (from pydantic.v1 import BaseModel), but migrating to the native v2 API is recommended for the full performance gain. See validate JSON Schema in Python for complementary validation patterns.
Can Pydantic generate JSON Schema from a model?
Yes. MyModel.model_json_schema() returns a JSON Schema Draft 2020-12 compatible dict. FastAPI uses this automatically for OpenAPI docs — every request and response model generates a schema entry visible in the /docs UI. LLM APIs (OpenAI structured outputs, Anthropic tool use) consume this schema directly. Use json.dumps(MyModel.model_json_schema(), indent=2) to print it. Field(description="...", examples=[...]) annotations appear in the schema output. For recursive models (a tree node that contains a list of tree nodes), call Model.model_rebuild() after the class definition to resolve forward references before calling model_json_schema(). Inspect the generated schema with Jsonic's JSON formatter to verify it matches expectations.
How do I validate a JSON array with Pydantic?
Use TypeAdapter: from pydantic import TypeAdapter; adapter = TypeAdapter(list[MyModel]); items = adapter.validate_json('[...]') — this parses a top-level JSON array directly without wrapping it in an object. TypeAdapter also works with validate_python() for Python lists. Alternatively, define a wrapper model with items: list[MyModel] = Field(...) if the array is nested under a key like {"users": [...]}. TypeAdapter supports the same Field() constraints as BaseModel fields, so TypeAdapter(list[Annotated[str, Field(min_length=1)]]) validates each element individually. For parsing JSON in Python more broadly,json.loads() is the stdlib starting point before applying Pydantic validation.
What happens when Pydantic encounters extra fields in the JSON?
By default, extra fields are ignored — model_config = ConfigDict(extra='ignore') is the default behavior. Set extra='forbid' to raise ValidationError for any unexpected field in the JSON input; this is useful for strict API contracts where you want to reject malformed or overly broad payloads. Set extra='allow' to accept and store extra fields in model_extra (accessible as a dict via model.model_extra). FastAPI uses extra='ignore' by default on request body models. The extra setting applies to the model it is defined on — child models inherit their own config independently, so nested models must each set their own extra policy if strict enforcement is required throughout the hierarchy.
Ready to validate JSON with Pydantic?
Use Jsonic's JSON Formatter to inspect and validate JSON payloads before or after Pydantic parsing. You can also diff two JSON responses to compare model outputs across schema versions.
Open JSON Formatter