Python JSON: json.loads, json.dumps, Pydantic & orjson Performance

Last updated:

Python's built-in json module parses JSON strings with json.loads() and serializes Python objects with json.dumps() — it handles str, int, float, bool, None, list, and dict natively, but raises TypeError for datetime, Decimal, and custom objects without a custom encoder. orjson, the fastest Python JSON library, benchmarks at 10× the throughput of the standard json module — it serializes datetime objects to ISO 8601 strings natively, handles numpy arrays, and processes a 10 MB JSON file in ~8 ms vs ~80 ms for the standard library. This guide covers json.loads() and json.dumps() with all options, custom JSONEncoder and object_hook for custom types, Pydantic v2 model_validate() and model_dump() for typed JSON, orjson for high-performance serialization, and handling JSON in FastAPI and Django REST Framework.

json.loads() and json.dumps(): Standard Library JSON

json.loads() and json.dumps()are the primary interface to Python's built-in JSON support. Both functions expose options that control parsing and serialization behavior — knowing these options prevents the most common pitfalls with Unicode, precision, and non-standard JSON formats.

import json
from decimal import Decimal
from collections import OrderedDict

# ── json.loads() options ──────────────────────────────────────────

# Basic parsing — returns dict, list, str, int, float, bool, or None
data = json.loads('{"name": "Alice", "age": 30, "active": true}')
# {"name": "Alice", "age": 30, "active": True}

# parse_float — preserve Decimal precision (default: float)
data = json.loads('{"price": 19.99}', parse_float=Decimal)
# {"price": Decimal("19.99")}  — no floating-point rounding

# parse_int — custom integer conversion (default: int)
data = json.loads('{"id": 12345}', parse_int=str)
# {"id": "12345"}  — useful when large ints exceed JS Number.MAX_SAFE_INTEGER

# object_hook — transform each decoded object dict
def as_ordered(pairs):
    return OrderedDict(pairs)

data = json.loads('{"b": 2, "a": 1}', object_pairs_hook=as_ordered)
# OrderedDict([("b", 2), ("a", 1)])  — preserves insertion order from JSON

# Common pitfalls — these are NOT valid JSON:
# json.loads("{'key': 'value'}")     → JSONDecodeError (single quotes)
# json.loads('{"key": "value",}')    → JSONDecodeError (trailing comma)
# json.loads('{"key": undefined}')   → JSONDecodeError (undefined not valid)
# json.loads('NaN')                  → JSONDecodeError in strict mode

try:
    json.loads("{'bad': 'json'}")
except json.JSONDecodeError as e:
    print(f"Parse error at line {e.lineno}, col {e.colno}: {e.msg}")
    # Parse error at line 1, col 2: Expecting property name enclosed in double quotes

# ── json.dumps() options ──────────────────────────────────────────

obj = {"name": "Alice", "score": 9.5, "tags": ["admin", "user"]}

# indent — pretty-print with indentation
print(json.dumps(obj, indent=2))
# {
#   "name": "Alice",
#   "score": 9.5,
#   "tags": ["admin", "user"]
# }

# sort_keys — deterministic output (useful for hashing, testing, diffing)
print(json.dumps({"b": 2, "a": 1}, sort_keys=True))
# {"a": 1, "b": 2}

# ensure_ascii=False — preserve Unicode as-is (default True escapes to \uXXXX)
print(json.dumps({"city": "北京"}, ensure_ascii=False))
# {"city": "北京"}
print(json.dumps({"city": "北京"}, ensure_ascii=True))
# {"city": "\u5317\u4eac"}

# separators — compact output (no spaces after : and ,)
print(json.dumps(obj, separators=(",", ":")))
# {"name":"Alice","score":9.5,"tags":["admin","user"]}

# json.dump() — write directly to a file (no intermediate string)
with open("output.json", "w", encoding="utf-8") as f:
    json.dump(obj, f, indent=2, ensure_ascii=False)

# json.load() — read from a file object
with open("output.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)

A critical distinction: json.loads() accepts a str, bytes, or bytearray (Python 3.6+), while json.load() accepts a file-like object with a .read() method. Trailing commas and single-quoted strings are the most common causes of JSONDecodeError when parsing JSON from non-standard sources (JavaScript object literals, YAML-derived configs, or hand-edited files). Use the json5 or pyjson5 package if you need to parse relaxed JSON with comments and trailing commas.

Custom JSONEncoder for Non-Serializable Types

Subclassing json.JSONEncoder and overriding default() is the standard approach for serializing types that the built-in encoder cannot handle. The default() method is called for each object that is not natively serializable — return a JSON-serializable value or call super().default(obj) to raise TypeError for truly unsupported types.

import json
import uuid
from datetime import datetime, date, timezone
from decimal import Decimal
from enum import Enum
from dataclasses import dataclass, asdict

class Color(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"

@dataclass
class Product:
    id: uuid.UUID
    name: str
    price: Decimal
    created_at: datetime
    color: Color

class CustomEncoder(json.JSONEncoder):
    """Handles datetime, date, Decimal, UUID, Enum, and dataclasses."""

    def default(self, obj):
        if isinstance(obj, (datetime,)):
            # Always include timezone info; use UTC if naive
            if obj.tzinfo is None:
                obj = obj.replace(tzinfo=timezone.utc)
            return obj.isoformat()
        if isinstance(obj, date):
            return obj.isoformat()           # "2026-05-20"
        if isinstance(obj, Decimal):
            return str(obj)                  # preserve precision as string
        if isinstance(obj, uuid.UUID):
            return str(obj)                  # "550e8400-e29b-41d4-a716-446655440000"
        if isinstance(obj, Enum):
            return obj.value                 # use the enum's value, not its name
        if hasattr(obj, '__dataclass_fields__'):
            return asdict(obj)               # convert dataclass to dict recursively
        # Let the base class raise TypeError for truly unsupported types
        return super().default(obj)

# Usage — pass cls= parameter to json.dumps
product = Product(
    id=uuid.UUID("550e8400-e29b-41d4-a716-446655440000"),
    name="Widget",
    price=Decimal("19.99"),
    created_at=datetime(2026, 5, 20, 10, 0, 0, tzinfo=timezone.utc),
    color=Color.RED,
)

result = json.dumps(product, cls=CustomEncoder, indent=2)
# {
#   "id": "550e8400-e29b-41d4-a716-446655440000",
#   "name": "Widget",
#   "price": "19.99",
#   "created_at": "2026-05-20T10:00:00+00:00",
#   "color": "red"
# }

# Alternative: use a default= function (simpler for one-off cases)
def json_default(obj):
    if isinstance(obj, Decimal):
        return str(obj)
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(f"Object of type {type(obj)} is not JSON serializable")

json.dumps({"price": Decimal("9.99")}, default=json_default)
# '{"price": "9.99"}'

The default() method is only called for objects that are not natively serializable — it is not called for dict, list, str, int, float, bool, or None. To intercept all objects (including dicts), override encode() or iterencode() instead, but this is rarely needed. For recursive structures containing non-serializable types inside dicts and lists, the default() method handles them correctly because the encoder recurses into containers before calling default() on leaf values.

object_hook: Custom JSON Deserialization

object_hook is a callback passed to json.loads() that is called for every JSON object (curly-brace pair) in the document. It receives the decoded dict and returns whatever value should be used in its place — enabling typed deserialization without a schema library. object_pairs_hook is the lower-level version that receives a list of (key, value) pairs, giving you control over key ordering and duplicate key handling.

import json
from datetime import datetime
from decimal import Decimal
from dataclasses import dataclass
from typing import Optional

@dataclass
class User:
    id: int
    name: str
    email: str
    created_at: Optional[datetime] = None

# ── object_hook: convert dicts to typed objects ───────────────────

def user_hook(d: dict):
    """Reconstruct a User if the dict has the expected shape."""
    if {"id", "name", "email"}.issubset(d.keys()):
        created_at = None
        if "created_at" in d and d["created_at"]:
            created_at = datetime.fromisoformat(d["created_at"])
        return User(
            id=d["id"],
            name=d["name"],
            email=d["email"],
            created_at=created_at,
        )
    return d  # return the dict unchanged for non-User objects

json_str = '{"id": 1, "name": "Alice", "email": "alice@example.com", "created_at": "2026-05-20T10:00:00"}'
user = json.loads(json_str, object_hook=user_hook)
# User(id=1, name='Alice', email='alice@example.com', created_at=datetime(2026, 5, 20, 10, 0))

# ── Type tagging pattern ──────────────────────────────────────────
# Embed a __type__ key in serialized JSON to enable self-describing deserialization

REGISTRY = {}

def register(cls):
    REGISTRY[cls.__name__] = cls
    return cls

@register
@dataclass
class Point:
    x: float
    y: float

def tagged_hook(d: dict):
    type_name = d.pop("__type__", None)
    if type_name and type_name in REGISTRY:
        cls = REGISTRY[type_name]
        return cls(**{k: v for k, v in d.items()})
    return d

serialized = '{"__type__": "Point", "x": 1.5, "y": 2.5}'
point = json.loads(serialized, object_hook=tagged_hook)
# Point(x=1.5, y=2.5)

# ── object_pairs_hook for duplicate key detection ─────────────────

def strict_dict(pairs):
    """Raise ValueError if any key appears more than once."""
    keys = [k for k, _ in pairs]
    if len(keys) != len(set(keys)):
        duplicates = {k for k in keys if keys.count(k) > 1}
        raise ValueError(f"Duplicate JSON keys: {duplicates}")
    return dict(pairs)

try:
    json.loads('{"a": 1, "a": 2}', object_pairs_hook=strict_dict)
except ValueError as e:
    print(e)  # Duplicate JSON keys: {'a'}

# Note: Python 3.7+ dicts preserve insertion order by default,
# so object_pairs_hook is no longer needed just for ordering.
# Use it only for custom key handling or duplicate detection.

A key limitation of object_hook: it is called bottom-up, so nested objects are converted before their parent. This means when the hook is called for an outer object, any nested object values have already been converted by an earlier hook call. If you need to inspect the full tree context during conversion, post-process the parsed dict tree instead of using object_hook. For complex typed deserialization with validation, Pydantic v2 (Section 4) is the more robust choice.

Pydantic v2: Typed JSON with model_validate and model_dump

Pydantic v2 provides schema-based JSON parsing with runtime type validation, coercion, and serialization. model_validate() accepts a dict and validates/coerces it into a typed model; model_validate_json() accepts a raw JSON string and is faster than calling json.loads()first because it uses pydantic-core's Rust JSON parser internally.

from pydantic import BaseModel, Field, field_serializer, model_validator
from pydantic import ConfigDict
from typing import Optional, List
from datetime import datetime
from decimal import Decimal
import json

# ── Basic model definition ────────────────────────────────────────

class Address(BaseModel):
    street: str
    city: str
    country: str = "US"

class User(BaseModel):
    id: int
    name: str
    email: str
    age: Optional[int] = None
    address: Optional[Address] = None
    created_at: datetime
    tags: List[str] = []

# model_validate() — from dict (after json.loads or from DB)
user = User.model_validate({
    "id": 1,
    "name": "Alice",
    "email": "alice@example.com",
    "created_at": "2026-05-20T10:00:00Z",
    "tags": ["admin"],
})
# created_at is coerced from string → datetime automatically

# model_validate_json() — from JSON string (faster: avoids Python json.loads)
user = User.model_validate_json(
    '{"id":1,"name":"Alice","email":"alice@example.com","created_at":"2026-05-20T10:00:00Z"}'
)

# model_dump() — to dict
d = user.model_dump()
# {"id": 1, "name": "Alice", ..., "created_at": datetime(2026, 5, 20, ...)}

# model_dump_json() — to JSON string (uses Rust serializer)
s = user.model_dump_json()
# '{"id":1,"name":"Alice",...,"created_at":"2026-05-20T10:00:00Z"}'

# model_dump with mode="json" — serialize to JSON-compatible types (no datetime objects)
d = user.model_dump(mode="json")
# {"id": 1, ..., "created_at": "2026-05-20T10:00:00Z"}  — str, not datetime

# ── Field aliases — JSON key ≠ Python attribute name ─────────────

class ApiResponse(BaseModel):
    model_config = ConfigDict(populate_by_name=True)

    user_id: int = Field(alias="userId")
    first_name: str = Field(alias="firstName")
    last_name: str = Field(alias="lastName")

resp = ApiResponse.model_validate({"userId": 1, "firstName": "Alice", "lastName": "Smith"})
print(resp.user_id)   # 1  — Python attribute name
print(resp.model_dump(by_alias=True))
# {"userId": 1, "firstName": "Alice", "lastName": "Smith"}

# ── Custom field serializer ──────────────────────────────────────

class Order(BaseModel):
    id: int
    total: Decimal

    @field_serializer("total")
    def serialize_total(self, value: Decimal) -> str:
        return str(value)  # serialize Decimal as string, not float

order = Order(id=1, total=Decimal("99.99"))
print(order.model_dump_json())
# {"id":1,"total":"99.99"}

# ── model_validator for cross-field validation ────────────────────

class DateRange(BaseModel):
    start: datetime
    end: datetime

    @model_validator(mode="after")
    def check_dates(self):
        if self.end <= self.start:
            raise ValueError("end must be after start")
        return self

Pydantic v2's model_validate_json() is the preferred entry point when parsing JSON strings — it avoids the Python-level json.loads() call and runs the entire parse-and-validate pipeline in Rust. For FastAPI, Pydantic models are used automatically for request body parsing and response serialization — declare the model as the function parameter type and FastAPI handles model_validate_json() and model_dump_json() internally. See JSON data validation for Pydantic schema patterns beyond basic models.

orjson: High-Performance Python JSON

orjson is a drop-in replacement for the standard json module with a key API difference: orjson.dumps() returns bytes, not str. This is intentional — JSON is UTF-8 encoded bytes, and skipping the bytes-to-str conversion saves ~15% of serialization time. orjson natively handles types that require custom encoders in the standard library: datetime, date, time, UUID, numpy arrays, and dataclasses.

import orjson
import numpy as np
from datetime import datetime, timezone
from uuid import UUID
from dataclasses import dataclass
from decimal import Decimal

# ── Basic usage ───────────────────────────────────────────────────

# dumps() returns bytes (not str)
data = {"name": "Alice", "score": 9.5}
b = orjson.dumps(data)
# b'{"name":"Alice","score":9.5}'  — bytes

# Decode to str if needed
s = orjson.dumps(data).decode("utf-8")

# loads() accepts str or bytes
parsed = orjson.loads(b'{"name":"Alice"}')
parsed = orjson.loads('{"name":"Alice"}')  # str also works

# ── Native type support (no custom encoder needed) ────────────────

@dataclass
class Event:
    id: UUID
    name: str
    occurred_at: datetime

event = Event(
    id=UUID("550e8400-e29b-41d4-a716-446655440000"),
    name="PageView",
    occurred_at=datetime(2026, 5, 20, 10, 0, 0, tzinfo=timezone.utc),
)

orjson.dumps(event)
# b'{"id":"550e8400-e29b-41d4-a716-446655440000","name":"PageView","occurred_at":"2026-05-20T10:00:00+00:00"}'

# numpy arrays
arr = np.array([1.0, 2.0, 3.0])
orjson.dumps({"values": arr}, option=orjson.OPT_SERIALIZE_NUMPY)
# b'{"values":[1.0,2.0,3.0]}'

# ── orjson options ────────────────────────────────────────────────

# OPT_INDENT_2 — pretty-print with 2-space indent
orjson.dumps(data, option=orjson.OPT_INDENT_2)

# OPT_SORT_KEYS — sort object keys alphabetically
orjson.dumps({"b": 2, "a": 1}, option=orjson.OPT_SORT_KEYS)
# b'{"a":1,"b":2}'

# Combine options with bitwise OR
orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)

# OPT_PASSTHROUGH_DATETIME — do not serialize datetime natively;
# pass it through to a custom default function instead
def custom_default(obj):
    if isinstance(obj, Decimal):
        return float(obj)
    raise TypeError

orjson.dumps({"price": Decimal("9.99")}, default=custom_default)
# b'{"price":9.99}'

# OPT_NON_STR_KEYS — allow non-string dict keys (int, float, bool, Enum)
orjson.dumps({1: "one", 2: "two"}, option=orjson.OPT_NON_STR_KEYS)
# b'{"1":"one","2":"two"}'

# ── Performance comparison ────────────────────────────────────────
# json.dumps(large_dict)    ~80 ms for 10 MB payload
# orjson.dumps(large_dict)  ~8 ms  for 10 MB payload  (10x faster)
# json.loads(large_str)     ~40 ms for 10 MB payload
# orjson.loads(large_bytes) ~13 ms for 10 MB payload  (3x faster)

# ── Error handling ────────────────────────────────────────────────
try:
    orjson.loads("{'bad': json}")
except orjson.JSONDecodeError as e:
    # orjson.JSONDecodeError is a subclass of json.JSONDecodeError
    print(e)

orjson's JSONDecodeError is a subclass of the standard json.JSONDecodeError, so existing except json.JSONDecodeError blocks catch orjson errors without modification. The default function in orjson works like JSONEncoder.default() — it is called for types orjson cannot serialize natively. orjson does not support Decimal natively; use default=lambda x: float(x) if isinstance(x, Decimal) else ... or convert to string for precision-sensitive data. See the JSON parsing performance guide for detailed benchmarks across Python JSON libraries.

JSON in FastAPI: Automatic Pydantic Serialization

FastAPI uses Pydantic models for automatic request body parsing and response serialization. Declaring a Pydantic model as a function parameter type causes FastAPI to call model_validate_json() on the request body and return a 422 Unprocessable Entity response (with a detailed JSON error body) if validation fails. Response serialization uses model_dump_json() automatically when the return type annotation is a Pydantic model.

from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse, ORJSONResponse
from pydantic import BaseModel, Field
from typing import Optional, List
from datetime import datetime

app = FastAPI()

class CreateUserRequest(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    email: str
    age: Optional[int] = Field(None, ge=0, le=150)

class UserResponse(BaseModel):
    id: int
    name: str
    email: str
    created_at: datetime

# ── Route with Pydantic request body + response_model ────────────

@app.post("/users", response_model=UserResponse, status_code=201)
async def create_user(body: CreateUserRequest):
    # body is already validated — name, email, age are typed Python values
    # FastAPI calls model_validate_json() automatically
    user = await db_create_user(body.name, body.email, body.age)
    return user  # FastAPI calls model_dump_json() via response_model

# ── Union types in response — use discriminated union ────────────

from typing import Union, Literal
from pydantic import RootModel

class SuccessResponse(BaseModel):
    status: Literal["success"] = "success"
    data: UserResponse

class ErrorResponse(BaseModel):
    status: Literal["error"] = "error"
    message: str
    code: str

@app.get("/users/{user_id}", response_model=Union[SuccessResponse, ErrorResponse])
async def get_user(user_id: int):
    user = await db_get_user(user_id)
    if not user:
        return ErrorResponse(message=f"User {user_id} not found", code="NOT_FOUND")
    return SuccessResponse(data=UserResponse.model_validate(user))

# ── JSONResponse for manual JSON control ─────────────────────────

@app.get("/health")
async def health():
    # JSONResponse accepts a dict and calls json.dumps() internally
    return JSONResponse(content={"status": "ok", "timestamp": datetime.utcnow().isoformat()})

# ── ORJSONResponse — 10x faster serialization via orjson ─────────

@app.get("/users", response_class=ORJSONResponse)
async def list_users():
    users = await db_list_users()
    # ORJSONResponse uses orjson.dumps() — handles datetime natively
    return [u.__dict__ for u in users]

# ── Custom default response class for the entire app ─────────────

app_orjson = FastAPI(default_response_class=ORJSONResponse)

# ── 422 validation error response shape ─────────────────────────
# FastAPI returns this JSON for invalid request bodies:
# {
#   "detail": [
#     {
#       "type": "string_too_short",
#       "loc": ["body", "name"],
#       "msg": "String should have at least 1 character",
#       "input": "",
#       "ctx": {"min_length": 1}
#     }
#   ]
# }

# ── Django REST Framework — serializer-based JSON ─────────────────
# from rest_framework import serializers
# class UserSerializer(serializers.Serializer):
#     id = serializers.IntegerField(read_only=True)
#     name = serializers.CharField(max_length=100)
#     email = serializers.EmailField()
#
# # In a view:
# serializer = UserSerializer(data=request.data)
# serializer.is_valid(raise_exception=True)   # returns 400 on failure
# user = serializer.validated_data            # typed dict
# return Response(UserSerializer(user).data)  # serialized JSON dict

FastAPI's ORJSONResponse is a drop-in replacement for JSONResponse that uses orjson for serialization — enable it per-route via response_class=ORJSONResponse or globally via default_response_class=ORJSONResponse in the FastAPI() constructor. For APIs that return large response bodies (user lists, analytics data), switching to ORJSONResponse can reduce response generation time by 60-80%. See the JSON API design guide for response envelope conventions and error shape standards.

Streaming Large JSON Files with ijson

ijson parses JSON incrementally, yielding Python objects as the parser encounters them — memory usage stays constant regardless of file size. For a 1 GB JSON array of records, json.load() requires ~3-4 GB of RAM (the raw file plus the Python object tree), while ijson.items() uses ~10 MB regardless of file size.

import ijson
import json
from pathlib import Path

# ── ijson.items() — stream array elements one at a time ──────────

# Input: large_file.json = [{"id": 1, ...}, {"id": 2, ...}, ...]
with open("large_file.json", "rb") as f:
    # "item" prefix selects each top-level array element
    for record in ijson.items(f, "item"):
        process(record)  # each record is a fully constructed dict

# ── Nested path — "data.records.item" ────────────────────────────
# Input: {"data": {"records": [{"id": 1}, {"id": 2}]}}
with open("nested.json", "rb") as f:
    for record in ijson.items(f, "data.records.item"):
        process(record)

# ── ijson.parse() — lower-level event stream ─────────────────────
# Yields (prefix, event, value) tuples for each JSON token
with open("data.json", "rb") as f:
    for prefix, event, value in ijson.parse(f):
        if prefix == "item.id" and event == "number":
            print(f"Found ID: {value}")

# ── Memory comparison for 1 GB JSON array ────────────────────────
# json.load():          ~3-4 GB RAM, ~80 seconds
# ijson.items():        ~10 MB RAM,  ~120 seconds (slower but memory-safe)
# simdjson iterator:    ~10 MB RAM,  ~40 seconds  (SIMD-accelerated)

# ── simdjson-python — SIMD-accelerated parsing ────────────────────
# pip install pysimdjson
import simdjson

parser = simdjson.Parser()
with open("large_file.json", "rb") as f:
    # parse() loads the whole file but is 3-5x faster than json.load()
    doc = parser.parse(f.read())
    for record in doc:
        process(dict(record))

# ── Writing large JSON files in chunks ───────────────────────────

def write_large_json_array(records, output_path: Path, batch_size: int = 1000):
    """Write a large iterable as a JSON array without loading all into memory."""
    with open(output_path, "w", encoding="utf-8") as f:
        f.write("[\n")
        first = True
        for record in records:
            if not first:
                f.write(",\n")
            f.write(json.dumps(record, ensure_ascii=False))
            first = False
        f.write("\n]")

# ── JSON Lines (JSONL) — one JSON object per line ─────────────────
# More streaming-friendly than a single large JSON array

def write_jsonl(records, output_path: Path):
    with open(output_path, "w", encoding="utf-8") as f:
        for record in records:
            f.write(json.dumps(record, ensure_ascii=False) + "\n")

def read_jsonl(input_path: Path):
    with open(input_path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                yield json.loads(line)

JSON Lines (JSONL / ndjson) — one JSON object per line, no wrapping array — is the preferred format for large datasets because it enables line-by-line streaming with standard file tools, supports parallel processing (split by line count), and allows appending without rewriting the file. For write performance, orjson's dumps() + decode() is 3-5× faster than json.dumps() when writing millions of records. See the JSON parsing performance guide for ijson vs simdjson benchmarks.

Key Terms

json.loads
The primary function for parsing JSON in Python's standard library. Accepts a str, bytes, or bytearray containing a JSON document and returns the corresponding Python object: JSON objects become dict, arrays become list, strings become str, numbers become int or float, true/false become bool, and null becomes None. Optional parameters include object_hook (callable applied to each decoded object), parse_float (constructor for float values, e.g., Decimal), parse_int (constructor for integer values), and object_pairs_hook (callable applied to each decoded object as a list of pairs). Raises json.JSONDecodeError for malformed input.
JSONDecodeError
An exception raised by json.loads() and json.load() when the input is not valid JSON. It is a subclass of ValueError, so existing except ValueError blocks catch it. The exception object exposes three attributes for diagnosing the error position: msg (a human-readable description of the syntax error), doc (the original input document), pos (the character index where parsing failed), lineno (the line number), and colno (the column number). Common triggers include trailing commas, single-quoted strings, comments, undefined values, and unquoted property names — all valid in JavaScript but not in the JSON specification.
JSONEncoder
The base class in Python's json module that controls how Python objects are serialized to JSON strings. Subclass it and override the default(self, obj) method to handle non-serializable types; the method receives the object that the encoder cannot serialize natively and must return a JSON-serializable value or call super().default(obj) to raise TypeError. Pass the subclass to json.dumps() via the cls= parameter. The encoder natively handles dict, list, tuple, str, int, float, bool, and None; it does not handle datetime, Decimal, UUID, set, or custom class instances without default() being overridden.
object_hook
A callable passed to json.loads() (via the object_hook= parameter) that transforms each decoded JSON object (curly-brace pair) before it is returned. The callable receives the decoded dict and returns whatever value should replace it in the output — allowing typed deserialization without a schema library. object_hook is called bottom-up: nested objects are processed before their parents. For lower-level control over key-value pairs (including ordering and duplicate key handling), use object_pairs_hook= instead, which receives a list of (key, value) tuples. When both are provided, object_pairs_hook takes precedence and object_hook is ignored.
model_validate
The Pydantic v2 class method (replacing v1's parse_obj()) that constructs and validates a model instance from a Python dict, another model instance, or any object. It performs type coercion (e.g., converting an ISO 8601 string to a datetime object), field validation (including Field() constraints like min_length, ge, le), and raises pydantic.ValidationError with a detailed error list if validation fails. The companion method model_validate_json() accepts a raw JSON string and is faster because it bypasses Python's json.loads()in favor of pydantic-core's Rust JSON parser. Pydantic v2's validation is 5-10× faster than v1 due to pydantic-core.
orjson
A Python JSON library implemented in Rust that provides loads() and dumps() as drop-in replacements for the standard library's equivalents, with three key differences: (1) orjson.dumps() returns bytes instead of str; (2) it natively serializes datetime, date, time, UUID, numpy arrays, and dataclasses without a custom encoder; (3) it is approximately 10× faster than the standard library for serialization and 3× faster for parsing. orjson does not support Decimal natively — use the default= function parameter to handle it. Install via pip install orjson. orjson.JSONDecodeError is a subclass of json.JSONDecodeError for compatibility.

FAQ

How do I parse JSON in Python?

Use json.loads(text) to parse a JSON string into a Python dict or list. Import the module first: import json. For a file, use json.load(open_file_handle) instead of json.loads(). Always wrap the call in try/except json.JSONDecodeError when parsing untrusted input — JSONDecodeError is a subclass of ValueError and provides the error position via .lineno and .colno. To preserve Decimal precision for financial data, pass parse_float=Decimal: json.loads(text, parse_float=Decimal). For high-performance parsing, use orjson.loads(text) which is 3× faster. For typed, validated parsing, use Pydantic's MyModel.model_validate_json(text).

How do I serialize a Python object to JSON?

Use json.dumps(obj) to serialize a Python object to a JSON string. It handles str, int, float, bool, None, list, tuple, and dict natively. For non-serializable types (datetime, Decimal, UUID, custom classes), either subclass json.JSONEncoder and pass cls=MyEncoder, or use a default= function. Use indent=2 for pretty-printing, sort_keys=True for deterministic output, and ensure_ascii=False to preserve Unicode. For high-performance serialization, use orjson.dumps(obj) which returns bytes and is 10× faster — it also handles datetime and UUID natively without a custom encoder.

How do I handle datetime objects in Python JSON serialization?

The standard json module raises TypeError for datetime objects. There are three main approaches. First, subclass json.JSONEncoder, override default() to call obj.isoformat() for datetime instances, and pass cls=MyEncoder to json.dumps(). Second, use orjson.dumps(obj) which serializes datetime to ISO 8601 strings natively — no encoder needed, and it handles timezone-aware datetimes correctly. Third, use Pydantic's model_dump_json() which handles datetime fields automatically per the model schema. For deserialization, use datetime.fromisoformat(s) (Python 3.7+) or an object_hook function that detects ISO 8601 strings by pattern and converts them back to datetime objects.

What is the difference between json.loads() and json.load() in Python?

json.loads(s) parses a str, bytes, or bytearray that contains JSON. json.load(fp) reads from a file-like object (any object with a .read() method) and parses its content. Use json.loads() when you already have JSON data in memory — for example, from response.text in an HTTP call or a database column. Use json.load() when reading from a file to avoid loading the entire file content into a string first. The write counterparts follow the same pattern: json.dumps(obj) returns a string, and json.dump(obj, fp) writes to a file-like object.

How do I use Pydantic with JSON in Python?

In Pydantic v2, use MyModel.model_validate(dict_data) to parse a dict into a validated model instance, or MyModel.model_validate_json(json_string) to parse directly from a JSON string (faster, uses Rust parser internally). To serialize to a dict, call instance.model_dump(); to serialize to a JSON string, call instance.model_dump_json(). Use Field(alias="jsonKey") to map JSON keys that differ from Python attribute names, and model_config = ConfigDict(populate_by_name=True) to accept both the alias and the Python name. Pydantic v2 uses pydantic-core (Rust) for validation, making model_validate() 5-10× faster than Pydantic v1's parse_obj().

Why is orjson faster than Python's built-in json module?

orjson is implemented in Rust using the Serde JSON library, which compiles to native machine code and uses SIMD instructions for string scanning and number parsing. The standard library json module is implemented in Python (with a partial C accelerator), carrying overhead from Python object creation, reference counting, and the Global Interpreter Lock. orjson benchmarks at approximately 10× the throughput for serialization and 3× faster for parsing. Additional advantages: orjson serializes datetime, UUID, numpy arrays, and dataclasses natively without Python-level encoder dispatch; it returns bytes instead of str (skipping a UTF-8 encode step); and it uses a more efficient floating-point serialization algorithm (Ryu).

How do I handle Decimal numbers in Python JSON?

Python's json module converts all JSON numbers to float by default, which loses precision (e.g., 0.1 + 0.2 != 0.3). To preserve precision during parsing, use json.loads(text, parse_float=Decimal) — this routes every JSON number through the Decimal constructor. For serialization, subclass JSONEncoder and return str(obj) for Decimal instances, or convert to float if precision loss is acceptable. For financial applications, the safest pattern is: parse with parse_float=Decimal, store as Decimal throughout, and serialize as a JSON string (not a JSON number) to guarantee round-trip precision. orjson does not natively support Decimal — pass a default= function that returns str(obj) for Decimal values.

How do I parse large JSON files in Python without loading into memory?

Use ijson, an incremental JSON parser that processes files as a stream. The pattern for a top-level JSON array is: for record in ijson.items(open_file, "item"): — this yields one dict per array element without loading the entire file. Memory usage stays at ~10 MB regardless of file size. For nested structures, use dot notation in the prefix: ijson.items(f, "data.records.item"). For even faster streaming, simdjson-pythonprovides SIMD-accelerated parsing with an iterator interface (~3× faster than ijson, ~10 MB memory). For write performance on large datasets, use JSON Lines (JSONL) format — one JSON object per line — which supports line-by-line streaming with standard file tools and allows appending without rewriting the file.

Further reading and primary sources