Python JSON: json.loads, json.dumps, Pydantic & orjson Performance
Last updated:
Python's built-in json module parses JSON strings with json.loads() and serializes Python objects with json.dumps() — it handles str, int, float, bool, None, list, and dict natively, but raises TypeError for datetime, Decimal, and custom objects without a custom encoder. orjson, the fastest Python JSON library, benchmarks at 10× the throughput of the standard json module — it serializes datetime objects to ISO 8601 strings natively, handles numpy arrays, and processes a 10 MB JSON file in ~8 ms vs ~80 ms for the standard library. This guide covers json.loads() and json.dumps() with all options, custom JSONEncoder and object_hook for custom types, Pydantic v2 model_validate() and model_dump() for typed JSON, orjson for high-performance serialization, and handling JSON in FastAPI and Django REST Framework.
json.loads() and json.dumps(): Standard Library JSON
json.loads() and json.dumps()are the primary interface to Python's built-in JSON support. Both functions expose options that control parsing and serialization behavior — knowing these options prevents the most common pitfalls with Unicode, precision, and non-standard JSON formats.
import json
from decimal import Decimal
from collections import OrderedDict
# ── json.loads() options ──────────────────────────────────────────
# Basic parsing — returns dict, list, str, int, float, bool, or None
data = json.loads('{"name": "Alice", "age": 30, "active": true}')
# {"name": "Alice", "age": 30, "active": True}
# parse_float — preserve Decimal precision (default: float)
data = json.loads('{"price": 19.99}', parse_float=Decimal)
# {"price": Decimal("19.99")} — no floating-point rounding
# parse_int — custom integer conversion (default: int)
data = json.loads('{"id": 12345}', parse_int=str)
# {"id": "12345"} — useful when large ints exceed JS Number.MAX_SAFE_INTEGER
# object_hook — transform each decoded object dict
def as_ordered(pairs):
return OrderedDict(pairs)
data = json.loads('{"b": 2, "a": 1}', object_pairs_hook=as_ordered)
# OrderedDict([("b", 2), ("a", 1)]) — preserves insertion order from JSON
# Common pitfalls — these are NOT valid JSON:
# json.loads("{'key': 'value'}") → JSONDecodeError (single quotes)
# json.loads('{"key": "value",}') → JSONDecodeError (trailing comma)
# json.loads('{"key": undefined}') → JSONDecodeError (undefined not valid)
# json.loads('NaN') → JSONDecodeError in strict mode
try:
json.loads("{'bad': 'json'}")
except json.JSONDecodeError as e:
print(f"Parse error at line {e.lineno}, col {e.colno}: {e.msg}")
# Parse error at line 1, col 2: Expecting property name enclosed in double quotes
# ── json.dumps() options ──────────────────────────────────────────
obj = {"name": "Alice", "score": 9.5, "tags": ["admin", "user"]}
# indent — pretty-print with indentation
print(json.dumps(obj, indent=2))
# {
# "name": "Alice",
# "score": 9.5,
# "tags": ["admin", "user"]
# }
# sort_keys — deterministic output (useful for hashing, testing, diffing)
print(json.dumps({"b": 2, "a": 1}, sort_keys=True))
# {"a": 1, "b": 2}
# ensure_ascii=False — preserve Unicode as-is (default True escapes to \uXXXX)
print(json.dumps({"city": "北京"}, ensure_ascii=False))
# {"city": "北京"}
print(json.dumps({"city": "北京"}, ensure_ascii=True))
# {"city": "\u5317\u4eac"}
# separators — compact output (no spaces after : and ,)
print(json.dumps(obj, separators=(",", ":")))
# {"name":"Alice","score":9.5,"tags":["admin","user"]}
# json.dump() — write directly to a file (no intermediate string)
with open("output.json", "w", encoding="utf-8") as f:
json.dump(obj, f, indent=2, ensure_ascii=False)
# json.load() — read from a file object
with open("output.json", "r", encoding="utf-8") as f:
loaded = json.load(f)A critical distinction: json.loads() accepts a str, bytes, or bytearray (Python 3.6+), while json.load() accepts a file-like object with a .read() method. Trailing commas and single-quoted strings are the most common causes of JSONDecodeError when parsing JSON from non-standard sources (JavaScript object literals, YAML-derived configs, or hand-edited files). Use the json5 or pyjson5 package if you need to parse relaxed JSON with comments and trailing commas.
Custom JSONEncoder for Non-Serializable Types
Subclassing json.JSONEncoder and overriding default() is the standard approach for serializing types that the built-in encoder cannot handle. The default() method is called for each object that is not natively serializable — return a JSON-serializable value or call super().default(obj) to raise TypeError for truly unsupported types.
import json
import uuid
from datetime import datetime, date, timezone
from decimal import Decimal
from enum import Enum
from dataclasses import dataclass, asdict
class Color(Enum):
RED = "red"
GREEN = "green"
BLUE = "blue"
@dataclass
class Product:
id: uuid.UUID
name: str
price: Decimal
created_at: datetime
color: Color
class CustomEncoder(json.JSONEncoder):
"""Handles datetime, date, Decimal, UUID, Enum, and dataclasses."""
def default(self, obj):
if isinstance(obj, (datetime,)):
# Always include timezone info; use UTC if naive
if obj.tzinfo is None:
obj = obj.replace(tzinfo=timezone.utc)
return obj.isoformat()
if isinstance(obj, date):
return obj.isoformat() # "2026-05-20"
if isinstance(obj, Decimal):
return str(obj) # preserve precision as string
if isinstance(obj, uuid.UUID):
return str(obj) # "550e8400-e29b-41d4-a716-446655440000"
if isinstance(obj, Enum):
return obj.value # use the enum's value, not its name
if hasattr(obj, '__dataclass_fields__'):
return asdict(obj) # convert dataclass to dict recursively
# Let the base class raise TypeError for truly unsupported types
return super().default(obj)
# Usage — pass cls= parameter to json.dumps
product = Product(
id=uuid.UUID("550e8400-e29b-41d4-a716-446655440000"),
name="Widget",
price=Decimal("19.99"),
created_at=datetime(2026, 5, 20, 10, 0, 0, tzinfo=timezone.utc),
color=Color.RED,
)
result = json.dumps(product, cls=CustomEncoder, indent=2)
# {
# "id": "550e8400-e29b-41d4-a716-446655440000",
# "name": "Widget",
# "price": "19.99",
# "created_at": "2026-05-20T10:00:00+00:00",
# "color": "red"
# }
# Alternative: use a default= function (simpler for one-off cases)
def json_default(obj):
if isinstance(obj, Decimal):
return str(obj)
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError(f"Object of type {type(obj)} is not JSON serializable")
json.dumps({"price": Decimal("9.99")}, default=json_default)
# '{"price": "9.99"}'The default() method is only called for objects that are not natively serializable — it is not called for dict, list, str, int, float, bool, or None. To intercept all objects (including dicts), override encode() or iterencode() instead, but this is rarely needed. For recursive structures containing non-serializable types inside dicts and lists, the default() method handles them correctly because the encoder recurses into containers before calling default() on leaf values.
object_hook: Custom JSON Deserialization
object_hook is a callback passed to json.loads() that is called for every JSON object (curly-brace pair) in the document. It receives the decoded dict and returns whatever value should be used in its place — enabling typed deserialization without a schema library. object_pairs_hook is the lower-level version that receives a list of (key, value) pairs, giving you control over key ordering and duplicate key handling.
import json
from datetime import datetime
from decimal import Decimal
from dataclasses import dataclass
from typing import Optional
@dataclass
class User:
id: int
name: str
email: str
created_at: Optional[datetime] = None
# ── object_hook: convert dicts to typed objects ───────────────────
def user_hook(d: dict):
"""Reconstruct a User if the dict has the expected shape."""
if {"id", "name", "email"}.issubset(d.keys()):
created_at = None
if "created_at" in d and d["created_at"]:
created_at = datetime.fromisoformat(d["created_at"])
return User(
id=d["id"],
name=d["name"],
email=d["email"],
created_at=created_at,
)
return d # return the dict unchanged for non-User objects
json_str = '{"id": 1, "name": "Alice", "email": "alice@example.com", "created_at": "2026-05-20T10:00:00"}'
user = json.loads(json_str, object_hook=user_hook)
# User(id=1, name='Alice', email='alice@example.com', created_at=datetime(2026, 5, 20, 10, 0))
# ── Type tagging pattern ──────────────────────────────────────────
# Embed a __type__ key in serialized JSON to enable self-describing deserialization
REGISTRY = {}
def register(cls):
REGISTRY[cls.__name__] = cls
return cls
@register
@dataclass
class Point:
x: float
y: float
def tagged_hook(d: dict):
type_name = d.pop("__type__", None)
if type_name and type_name in REGISTRY:
cls = REGISTRY[type_name]
return cls(**{k: v for k, v in d.items()})
return d
serialized = '{"__type__": "Point", "x": 1.5, "y": 2.5}'
point = json.loads(serialized, object_hook=tagged_hook)
# Point(x=1.5, y=2.5)
# ── object_pairs_hook for duplicate key detection ─────────────────
def strict_dict(pairs):
"""Raise ValueError if any key appears more than once."""
keys = [k for k, _ in pairs]
if len(keys) != len(set(keys)):
duplicates = {k for k in keys if keys.count(k) > 1}
raise ValueError(f"Duplicate JSON keys: {duplicates}")
return dict(pairs)
try:
json.loads('{"a": 1, "a": 2}', object_pairs_hook=strict_dict)
except ValueError as e:
print(e) # Duplicate JSON keys: {'a'}
# Note: Python 3.7+ dicts preserve insertion order by default,
# so object_pairs_hook is no longer needed just for ordering.
# Use it only for custom key handling or duplicate detection.A key limitation of object_hook: it is called bottom-up, so nested objects are converted before their parent. This means when the hook is called for an outer object, any nested object values have already been converted by an earlier hook call. If you need to inspect the full tree context during conversion, post-process the parsed dict tree instead of using object_hook. For complex typed deserialization with validation, Pydantic v2 (Section 4) is the more robust choice.
Pydantic v2: Typed JSON with model_validate and model_dump
Pydantic v2 provides schema-based JSON parsing with runtime type validation, coercion, and serialization. model_validate() accepts a dict and validates/coerces it into a typed model; model_validate_json() accepts a raw JSON string and is faster than calling json.loads()first because it uses pydantic-core's Rust JSON parser internally.
from pydantic import BaseModel, Field, field_serializer, model_validator
from pydantic import ConfigDict
from typing import Optional, List
from datetime import datetime
from decimal import Decimal
import json
# ── Basic model definition ────────────────────────────────────────
class Address(BaseModel):
street: str
city: str
country: str = "US"
class User(BaseModel):
id: int
name: str
email: str
age: Optional[int] = None
address: Optional[Address] = None
created_at: datetime
tags: List[str] = []
# model_validate() — from dict (after json.loads or from DB)
user = User.model_validate({
"id": 1,
"name": "Alice",
"email": "alice@example.com",
"created_at": "2026-05-20T10:00:00Z",
"tags": ["admin"],
})
# created_at is coerced from string → datetime automatically
# model_validate_json() — from JSON string (faster: avoids Python json.loads)
user = User.model_validate_json(
'{"id":1,"name":"Alice","email":"alice@example.com","created_at":"2026-05-20T10:00:00Z"}'
)
# model_dump() — to dict
d = user.model_dump()
# {"id": 1, "name": "Alice", ..., "created_at": datetime(2026, 5, 20, ...)}
# model_dump_json() — to JSON string (uses Rust serializer)
s = user.model_dump_json()
# '{"id":1,"name":"Alice",...,"created_at":"2026-05-20T10:00:00Z"}'
# model_dump with mode="json" — serialize to JSON-compatible types (no datetime objects)
d = user.model_dump(mode="json")
# {"id": 1, ..., "created_at": "2026-05-20T10:00:00Z"} — str, not datetime
# ── Field aliases — JSON key ≠ Python attribute name ─────────────
class ApiResponse(BaseModel):
model_config = ConfigDict(populate_by_name=True)
user_id: int = Field(alias="userId")
first_name: str = Field(alias="firstName")
last_name: str = Field(alias="lastName")
resp = ApiResponse.model_validate({"userId": 1, "firstName": "Alice", "lastName": "Smith"})
print(resp.user_id) # 1 — Python attribute name
print(resp.model_dump(by_alias=True))
# {"userId": 1, "firstName": "Alice", "lastName": "Smith"}
# ── Custom field serializer ──────────────────────────────────────
class Order(BaseModel):
id: int
total: Decimal
@field_serializer("total")
def serialize_total(self, value: Decimal) -> str:
return str(value) # serialize Decimal as string, not float
order = Order(id=1, total=Decimal("99.99"))
print(order.model_dump_json())
# {"id":1,"total":"99.99"}
# ── model_validator for cross-field validation ────────────────────
class DateRange(BaseModel):
start: datetime
end: datetime
@model_validator(mode="after")
def check_dates(self):
if self.end <= self.start:
raise ValueError("end must be after start")
return selfPydantic v2's model_validate_json() is the preferred entry point when parsing JSON strings — it avoids the Python-level json.loads() call and runs the entire parse-and-validate pipeline in Rust. For FastAPI, Pydantic models are used automatically for request body parsing and response serialization — declare the model as the function parameter type and FastAPI handles model_validate_json() and model_dump_json() internally. See JSON data validation for Pydantic schema patterns beyond basic models.
orjson: High-Performance Python JSON
orjson is a drop-in replacement for the standard json module with a key API difference: orjson.dumps() returns bytes, not str. This is intentional — JSON is UTF-8 encoded bytes, and skipping the bytes-to-str conversion saves ~15% of serialization time. orjson natively handles types that require custom encoders in the standard library: datetime, date, time, UUID, numpy arrays, and dataclasses.
import orjson
import numpy as np
from datetime import datetime, timezone
from uuid import UUID
from dataclasses import dataclass
from decimal import Decimal
# ── Basic usage ───────────────────────────────────────────────────
# dumps() returns bytes (not str)
data = {"name": "Alice", "score": 9.5}
b = orjson.dumps(data)
# b'{"name":"Alice","score":9.5}' — bytes
# Decode to str if needed
s = orjson.dumps(data).decode("utf-8")
# loads() accepts str or bytes
parsed = orjson.loads(b'{"name":"Alice"}')
parsed = orjson.loads('{"name":"Alice"}') # str also works
# ── Native type support (no custom encoder needed) ────────────────
@dataclass
class Event:
id: UUID
name: str
occurred_at: datetime
event = Event(
id=UUID("550e8400-e29b-41d4-a716-446655440000"),
name="PageView",
occurred_at=datetime(2026, 5, 20, 10, 0, 0, tzinfo=timezone.utc),
)
orjson.dumps(event)
# b'{"id":"550e8400-e29b-41d4-a716-446655440000","name":"PageView","occurred_at":"2026-05-20T10:00:00+00:00"}'
# numpy arrays
arr = np.array([1.0, 2.0, 3.0])
orjson.dumps({"values": arr}, option=orjson.OPT_SERIALIZE_NUMPY)
# b'{"values":[1.0,2.0,3.0]}'
# ── orjson options ────────────────────────────────────────────────
# OPT_INDENT_2 — pretty-print with 2-space indent
orjson.dumps(data, option=orjson.OPT_INDENT_2)
# OPT_SORT_KEYS — sort object keys alphabetically
orjson.dumps({"b": 2, "a": 1}, option=orjson.OPT_SORT_KEYS)
# b'{"a":1,"b":2}'
# Combine options with bitwise OR
orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)
# OPT_PASSTHROUGH_DATETIME — do not serialize datetime natively;
# pass it through to a custom default function instead
def custom_default(obj):
if isinstance(obj, Decimal):
return float(obj)
raise TypeError
orjson.dumps({"price": Decimal("9.99")}, default=custom_default)
# b'{"price":9.99}'
# OPT_NON_STR_KEYS — allow non-string dict keys (int, float, bool, Enum)
orjson.dumps({1: "one", 2: "two"}, option=orjson.OPT_NON_STR_KEYS)
# b'{"1":"one","2":"two"}'
# ── Performance comparison ────────────────────────────────────────
# json.dumps(large_dict) ~80 ms for 10 MB payload
# orjson.dumps(large_dict) ~8 ms for 10 MB payload (10x faster)
# json.loads(large_str) ~40 ms for 10 MB payload
# orjson.loads(large_bytes) ~13 ms for 10 MB payload (3x faster)
# ── Error handling ────────────────────────────────────────────────
try:
orjson.loads("{'bad': json}")
except orjson.JSONDecodeError as e:
# orjson.JSONDecodeError is a subclass of json.JSONDecodeError
print(e)orjson's JSONDecodeError is a subclass of the standard json.JSONDecodeError, so existing except json.JSONDecodeError blocks catch orjson errors without modification. The default function in orjson works like JSONEncoder.default() — it is called for types orjson cannot serialize natively. orjson does not support Decimal natively; use default=lambda x: float(x) if isinstance(x, Decimal) else ... or convert to string for precision-sensitive data. See the JSON parsing performance guide for detailed benchmarks across Python JSON libraries.
JSON in FastAPI: Automatic Pydantic Serialization
FastAPI uses Pydantic models for automatic request body parsing and response serialization. Declaring a Pydantic model as a function parameter type causes FastAPI to call model_validate_json() on the request body and return a 422 Unprocessable Entity response (with a detailed JSON error body) if validation fails. Response serialization uses model_dump_json() automatically when the return type annotation is a Pydantic model.
from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse, ORJSONResponse
from pydantic import BaseModel, Field
from typing import Optional, List
from datetime import datetime
app = FastAPI()
class CreateUserRequest(BaseModel):
name: str = Field(min_length=1, max_length=100)
email: str
age: Optional[int] = Field(None, ge=0, le=150)
class UserResponse(BaseModel):
id: int
name: str
email: str
created_at: datetime
# ── Route with Pydantic request body + response_model ────────────
@app.post("/users", response_model=UserResponse, status_code=201)
async def create_user(body: CreateUserRequest):
# body is already validated — name, email, age are typed Python values
# FastAPI calls model_validate_json() automatically
user = await db_create_user(body.name, body.email, body.age)
return user # FastAPI calls model_dump_json() via response_model
# ── Union types in response — use discriminated union ────────────
from typing import Union, Literal
from pydantic import RootModel
class SuccessResponse(BaseModel):
status: Literal["success"] = "success"
data: UserResponse
class ErrorResponse(BaseModel):
status: Literal["error"] = "error"
message: str
code: str
@app.get("/users/{user_id}", response_model=Union[SuccessResponse, ErrorResponse])
async def get_user(user_id: int):
user = await db_get_user(user_id)
if not user:
return ErrorResponse(message=f"User {user_id} not found", code="NOT_FOUND")
return SuccessResponse(data=UserResponse.model_validate(user))
# ── JSONResponse for manual JSON control ─────────────────────────
@app.get("/health")
async def health():
# JSONResponse accepts a dict and calls json.dumps() internally
return JSONResponse(content={"status": "ok", "timestamp": datetime.utcnow().isoformat()})
# ── ORJSONResponse — 10x faster serialization via orjson ─────────
@app.get("/users", response_class=ORJSONResponse)
async def list_users():
users = await db_list_users()
# ORJSONResponse uses orjson.dumps() — handles datetime natively
return [u.__dict__ for u in users]
# ── Custom default response class for the entire app ─────────────
app_orjson = FastAPI(default_response_class=ORJSONResponse)
# ── 422 validation error response shape ─────────────────────────
# FastAPI returns this JSON for invalid request bodies:
# {
# "detail": [
# {
# "type": "string_too_short",
# "loc": ["body", "name"],
# "msg": "String should have at least 1 character",
# "input": "",
# "ctx": {"min_length": 1}
# }
# ]
# }
# ── Django REST Framework — serializer-based JSON ─────────────────
# from rest_framework import serializers
# class UserSerializer(serializers.Serializer):
# id = serializers.IntegerField(read_only=True)
# name = serializers.CharField(max_length=100)
# email = serializers.EmailField()
#
# # In a view:
# serializer = UserSerializer(data=request.data)
# serializer.is_valid(raise_exception=True) # returns 400 on failure
# user = serializer.validated_data # typed dict
# return Response(UserSerializer(user).data) # serialized JSON dictFastAPI's ORJSONResponse is a drop-in replacement for JSONResponse that uses orjson for serialization — enable it per-route via response_class=ORJSONResponse or globally via default_response_class=ORJSONResponse in the FastAPI() constructor. For APIs that return large response bodies (user lists, analytics data), switching to ORJSONResponse can reduce response generation time by 60-80%. See the JSON API design guide for response envelope conventions and error shape standards.
Streaming Large JSON Files with ijson
ijson parses JSON incrementally, yielding Python objects as the parser encounters them — memory usage stays constant regardless of file size. For a 1 GB JSON array of records, json.load() requires ~3-4 GB of RAM (the raw file plus the Python object tree), while ijson.items() uses ~10 MB regardless of file size.
import ijson
import json
from pathlib import Path
# ── ijson.items() — stream array elements one at a time ──────────
# Input: large_file.json = [{"id": 1, ...}, {"id": 2, ...}, ...]
with open("large_file.json", "rb") as f:
# "item" prefix selects each top-level array element
for record in ijson.items(f, "item"):
process(record) # each record is a fully constructed dict
# ── Nested path — "data.records.item" ────────────────────────────
# Input: {"data": {"records": [{"id": 1}, {"id": 2}]}}
with open("nested.json", "rb") as f:
for record in ijson.items(f, "data.records.item"):
process(record)
# ── ijson.parse() — lower-level event stream ─────────────────────
# Yields (prefix, event, value) tuples for each JSON token
with open("data.json", "rb") as f:
for prefix, event, value in ijson.parse(f):
if prefix == "item.id" and event == "number":
print(f"Found ID: {value}")
# ── Memory comparison for 1 GB JSON array ────────────────────────
# json.load(): ~3-4 GB RAM, ~80 seconds
# ijson.items(): ~10 MB RAM, ~120 seconds (slower but memory-safe)
# simdjson iterator: ~10 MB RAM, ~40 seconds (SIMD-accelerated)
# ── simdjson-python — SIMD-accelerated parsing ────────────────────
# pip install pysimdjson
import simdjson
parser = simdjson.Parser()
with open("large_file.json", "rb") as f:
# parse() loads the whole file but is 3-5x faster than json.load()
doc = parser.parse(f.read())
for record in doc:
process(dict(record))
# ── Writing large JSON files in chunks ───────────────────────────
def write_large_json_array(records, output_path: Path, batch_size: int = 1000):
"""Write a large iterable as a JSON array without loading all into memory."""
with open(output_path, "w", encoding="utf-8") as f:
f.write("[\n")
first = True
for record in records:
if not first:
f.write(",\n")
f.write(json.dumps(record, ensure_ascii=False))
first = False
f.write("\n]")
# ── JSON Lines (JSONL) — one JSON object per line ─────────────────
# More streaming-friendly than a single large JSON array
def write_jsonl(records, output_path: Path):
with open(output_path, "w", encoding="utf-8") as f:
for record in records:
f.write(json.dumps(record, ensure_ascii=False) + "\n")
def read_jsonl(input_path: Path):
with open(input_path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:
yield json.loads(line)JSON Lines (JSONL / ndjson) — one JSON object per line, no wrapping array — is the preferred format for large datasets because it enables line-by-line streaming with standard file tools, supports parallel processing (split by line count), and allows appending without rewriting the file. For write performance, orjson's dumps() + decode() is 3-5× faster than json.dumps() when writing millions of records. See the JSON parsing performance guide for ijson vs simdjson benchmarks.
Key Terms
- json.loads
- The primary function for parsing JSON in Python's standard library. Accepts a
str,bytes, orbytearraycontaining a JSON document and returns the corresponding Python object: JSON objects becomedict, arrays becomelist, strings becomestr, numbers becomeintorfloat,true/falsebecomebool, andnullbecomesNone. Optional parameters includeobject_hook(callable applied to each decoded object),parse_float(constructor for float values, e.g.,Decimal),parse_int(constructor for integer values), andobject_pairs_hook(callable applied to each decoded object as a list of pairs). Raisesjson.JSONDecodeErrorfor malformed input. - JSONDecodeError
- An exception raised by
json.loads()andjson.load()when the input is not valid JSON. It is a subclass ofValueError, so existingexcept ValueErrorblocks catch it. The exception object exposes three attributes for diagnosing the error position:msg(a human-readable description of the syntax error),doc(the original input document),pos(the character index where parsing failed),lineno(the line number), andcolno(the column number). Common triggers include trailing commas, single-quoted strings, comments,undefinedvalues, and unquoted property names — all valid in JavaScript but not in the JSON specification. - JSONEncoder
- The base class in Python's
jsonmodule that controls how Python objects are serialized to JSON strings. Subclass it and override thedefault(self, obj)method to handle non-serializable types; the method receives the object that the encoder cannot serialize natively and must return a JSON-serializable value or callsuper().default(obj)to raiseTypeError. Pass the subclass tojson.dumps()via thecls=parameter. The encoder natively handlesdict,list,tuple,str,int,float,bool, andNone; it does not handledatetime,Decimal,UUID,set, or custom class instances withoutdefault()being overridden. - object_hook
- A callable passed to
json.loads()(via theobject_hook=parameter) that transforms each decoded JSON object (curly-brace pair) before it is returned. The callable receives the decodeddictand returns whatever value should replace it in the output — allowing typed deserialization without a schema library.object_hookis called bottom-up: nested objects are processed before their parents. For lower-level control over key-value pairs (including ordering and duplicate key handling), useobject_pairs_hook=instead, which receives a list of(key, value)tuples. When both are provided,object_pairs_hooktakes precedence andobject_hookis ignored. - model_validate
- The Pydantic v2 class method (replacing v1's
parse_obj()) that constructs and validates a model instance from a Python dict, another model instance, or any object. It performs type coercion (e.g., converting an ISO 8601 string to adatetimeobject), field validation (includingField()constraints likemin_length,ge,le), and raisespydantic.ValidationErrorwith a detailed error list if validation fails. The companion methodmodel_validate_json()accepts a raw JSON string and is faster because it bypasses Python'sjson.loads()in favor of pydantic-core's Rust JSON parser. Pydantic v2's validation is 5-10× faster than v1 due to pydantic-core. - orjson
- A Python JSON library implemented in Rust that provides
loads()anddumps()as drop-in replacements for the standard library's equivalents, with three key differences: (1)orjson.dumps()returnsbytesinstead ofstr; (2) it natively serializesdatetime,date,time,UUID, numpy arrays, and dataclasses without a custom encoder; (3) it is approximately 10× faster than the standard library for serialization and 3× faster for parsing. orjson does not supportDecimalnatively — use thedefault=function parameter to handle it. Install viapip install orjson.orjson.JSONDecodeErroris a subclass ofjson.JSONDecodeErrorfor compatibility.
FAQ
How do I parse JSON in Python?
Use json.loads(text) to parse a JSON string into a Python dict or list. Import the module first: import json. For a file, use json.load(open_file_handle) instead of json.loads(). Always wrap the call in try/except json.JSONDecodeError when parsing untrusted input — JSONDecodeError is a subclass of ValueError and provides the error position via .lineno and .colno. To preserve Decimal precision for financial data, pass parse_float=Decimal: json.loads(text, parse_float=Decimal). For high-performance parsing, use orjson.loads(text) which is 3× faster. For typed, validated parsing, use Pydantic's MyModel.model_validate_json(text).
How do I serialize a Python object to JSON?
Use json.dumps(obj) to serialize a Python object to a JSON string. It handles str, int, float, bool, None, list, tuple, and dict natively. For non-serializable types (datetime, Decimal, UUID, custom classes), either subclass json.JSONEncoder and pass cls=MyEncoder, or use a default= function. Use indent=2 for pretty-printing, sort_keys=True for deterministic output, and ensure_ascii=False to preserve Unicode. For high-performance serialization, use orjson.dumps(obj) which returns bytes and is 10× faster — it also handles datetime and UUID natively without a custom encoder.
How do I handle datetime objects in Python JSON serialization?
The standard json module raises TypeError for datetime objects. There are three main approaches. First, subclass json.JSONEncoder, override default() to call obj.isoformat() for datetime instances, and pass cls=MyEncoder to json.dumps(). Second, use orjson.dumps(obj) which serializes datetime to ISO 8601 strings natively — no encoder needed, and it handles timezone-aware datetimes correctly. Third, use Pydantic's model_dump_json() which handles datetime fields automatically per the model schema. For deserialization, use datetime.fromisoformat(s) (Python 3.7+) or an object_hook function that detects ISO 8601 strings by pattern and converts them back to datetime objects.
What is the difference between json.loads() and json.load() in Python?
json.loads(s) parses a str, bytes, or bytearray that contains JSON. json.load(fp) reads from a file-like object (any object with a .read() method) and parses its content. Use json.loads() when you already have JSON data in memory — for example, from response.text in an HTTP call or a database column. Use json.load() when reading from a file to avoid loading the entire file content into a string first. The write counterparts follow the same pattern: json.dumps(obj) returns a string, and json.dump(obj, fp) writes to a file-like object.
How do I use Pydantic with JSON in Python?
In Pydantic v2, use MyModel.model_validate(dict_data) to parse a dict into a validated model instance, or MyModel.model_validate_json(json_string) to parse directly from a JSON string (faster, uses Rust parser internally). To serialize to a dict, call instance.model_dump(); to serialize to a JSON string, call instance.model_dump_json(). Use Field(alias="jsonKey") to map JSON keys that differ from Python attribute names, and model_config = ConfigDict(populate_by_name=True) to accept both the alias and the Python name. Pydantic v2 uses pydantic-core (Rust) for validation, making model_validate() 5-10× faster than Pydantic v1's parse_obj().
Why is orjson faster than Python's built-in json module?
orjson is implemented in Rust using the Serde JSON library, which compiles to native machine code and uses SIMD instructions for string scanning and number parsing. The standard library json module is implemented in Python (with a partial C accelerator), carrying overhead from Python object creation, reference counting, and the Global Interpreter Lock. orjson benchmarks at approximately 10× the throughput for serialization and 3× faster for parsing. Additional advantages: orjson serializes datetime, UUID, numpy arrays, and dataclasses natively without Python-level encoder dispatch; it returns bytes instead of str (skipping a UTF-8 encode step); and it uses a more efficient floating-point serialization algorithm (Ryu).
How do I handle Decimal numbers in Python JSON?
Python's json module converts all JSON numbers to float by default, which loses precision (e.g., 0.1 + 0.2 != 0.3). To preserve precision during parsing, use json.loads(text, parse_float=Decimal) — this routes every JSON number through the Decimal constructor. For serialization, subclass JSONEncoder and return str(obj) for Decimal instances, or convert to float if precision loss is acceptable. For financial applications, the safest pattern is: parse with parse_float=Decimal, store as Decimal throughout, and serialize as a JSON string (not a JSON number) to guarantee round-trip precision. orjson does not natively support Decimal — pass a default= function that returns str(obj) for Decimal values.
How do I parse large JSON files in Python without loading into memory?
Use ijson, an incremental JSON parser that processes files as a stream. The pattern for a top-level JSON array is: for record in ijson.items(open_file, "item"): — this yields one dict per array element without loading the entire file. Memory usage stays at ~10 MB regardless of file size. For nested structures, use dot notation in the prefix: ijson.items(f, "data.records.item"). For even faster streaming, simdjson-pythonprovides SIMD-accelerated parsing with an iterator interface (~3× faster than ijson, ~10 MB memory). For write performance on large datasets, use JSON Lines (JSONL) format — one JSON object per line — which supports line-by-line streaming with standard file tools and allows appending without rewriting the file.
Further reading and primary sources
- Python json module documentation — Official Python docs covering json.loads(), json.dumps(), JSONEncoder, and all parameters
- orjson on GitHub — orjson source, benchmark results, API reference, and supported types
- Pydantic v2 JSON documentation — Pydantic v2 model_validate_json(), model_dump_json(), and custom JSON serialization
- ijson documentation — Incremental JSON parser for streaming large files with ijson.items() and ijson.parse()
- FastAPI Request Body — FastAPI automatic Pydantic request body parsing and response_model serialization