Convert JSON to a Pandas DataFrame in Python

Q: What is the difference between pd.read_json() and pd.json_normalize()?

pd.read_json() reads a flat or moderately nested JSON structure into a DataFrame — it handles files, URLs, and strings but stores nested objects as Python dict objects in a single column. pd.json_normalize() recursively flattens nested objects into separate columns: {"address": {"city": "Paris"}} becomes an address.city column. Use pd.read_json() for flat JSON arrays, and pd.json_normalize() when your JSON has nested objects you want as separate columns.

Q: How do I handle nested JSON in pandas?

Use pd.json_normalize(): from pandas import json_normalize; df = json_normalize(data, sep="_"). This flattens one level of nesting by default — {"user": {"id": 1, "name": "Alice"}} becomes user_id and user_name columns. For nested arrays (e.g., a "tags" list inside each object), use the record_path parameter: json_normalize(data, record_path="tags", meta=["user_id"]) creates one row per tag with the parent user_id attached.

Q: How do I read a JSON file with pandas?

df = pd.read_json("path/to/data.json"). For NDJSON/JSON Lines files (one JSON object per line): df = pd.read_json("events.ndjson", lines=True). For large files, use chunksize: for chunk in pd.read_json("big.ndjson", lines=True, chunksize=10000): process(chunk). If the file is an object (not an array), specify the orient: pd.read_json("data.json", orient="index").

Q: How do I convert a pandas DataFrame back to JSON?

df.to_json(orient="records") produces an array of objects — the same format as the input for most API round-trips. Add indent=2 for pretty printing: df.to_json(orient="records", indent=2). Use orient="split" to preserve the pandas structure (index, columns, data). To write to a file: df.to_json("output.json", orient="records", indent=2). To also convert to CSV: df.to_csv("output.csv", index=False).

Q: How do I handle missing values when converting JSON to DataFrame?

Missing keys in JSON objects automatically become NaN in pandas. Check which columns have missing values with df.isnull().sum(). Fill with a default: df["email"] = df["email"].fillna(""). To drop rows with any NaN: df.dropna(). To drop only rows where specific columns are null: df.dropna(subset=["id", "name"]). Note that NaN is a float value in pandas, so columns with mixed int/NaN become float64; convert after filling: df["age"] = df["age"].fillna(0).astype(int).

Converting JSON to a pandas DataFrame is a one-liner for flat arrays: pd.read_json(json_str) or pd.DataFrame(list_of_dicts). For nested JSON, pd.json_normalize() flattens the structure into columns automatically — "address.city" becomes the column name address.city. This guide covers 4 methods: pd.read_json() (file or string), pd.DataFrame() (list of dicts), pd.json_normalize() (nested objects), and pd.read_json(lines=True) for NDJSON streams — plus how to handle null values, mixed types, and export back to CSV.

Validate your JSON in Jsonic before loading it into pandas.

Open JSON Formatter

pd.read_json(): read from string, file, or URL

pd.read_json() is the most direct method — it accepts a JSON string, a local file path, or a remote URL and returns a DataFrame in one call. For parsing JSON in Python with json.loads() manually before conversion, see the dedicated guide; pd.read_json() does both steps at once.

import pandas as pd

# From a JSON string (array of objects)
json_str = '[{"id":1,"name":"Alice","age":30},{"id":2,"name":"Bob","age":25}]'
df = pd.read_json(json_str)
print(df)
#    id   name  age
# 0   1  Alice   30
# 1   2    Bob   25

# From a local file
df = pd.read_json('data.json')

# From a URL
df = pd.read_json('https://api.example.com/users')

# Specify orient explicitly for unambiguous parsing
df = pd.read_json(json_str, orient='records')

# Auto-parse ISO 8601 date strings into datetime columns
df = pd.read_json(json_str, convert_dates=['created_at', 'updated_at'])

# Enforce column dtypes
df = pd.read_json(json_str, dtype={'id': int, 'name': str})

The orient parameter tells pandas what shape the JSON is in. pandas tries to infer it automatically, but specifying it explicitly avoids surprises when the JSON structure is ambiguous:

orient	JSON structure	Use when
`'records'`	`[{col: val}, ...]`	Array of objects (default for lists)
`'split'`	`{index:[], columns:[], data:[]}`	pandas-style export / round-trip
`'index'`	`{row_label: {col:val}}`	Dict of dicts keyed by row label
`'columns'`	`{col: {row_label: val}}`	Dict keyed by column name
`'values'`	`[[val, val, ...]]`	Plain 2-D array (no column names)

Type inference is automatic: integer fields become int64, decimal numbers become float64, and string fields become object. Use the dtype parameter to override inferred types when needed.

pd.DataFrame() from a list of dicts (fastest approach)

When you already have a Python list of dicts — typically after calling json.loads() on an API response — passing it directly to pd.DataFrame() is the fastest and most explicit approach. Column names are taken from dictionary keys. Missing keys in any dict automatically become NaN.

import json
import pandas as pd

json_str = '[{"id":1,"name":"Alice","email":"alice@example.com"},{"id":2,"name":"Bob"}]'

# Step 1: parse JSON string to a Python list
data = json.loads(json_str)

# Step 2: convert to DataFrame — missing keys become NaN
df = pd.DataFrame(data)
print(df)
#    id   name                 email
# 0   1  Alice  alice@example.com
# 1   2    Bob                NaN

# Specify column order explicitly
df = pd.DataFrame(data, columns=['id', 'name', 'email'])

# Nested dict as a column value — stored as object, NOT expanded
nested = [
    {"id": 1, "name": "Alice", "address": {"city": "Paris", "zip": "75001"}},
    {"id": 2, "name": "Bob",   "address": {"city": "Lyon",  "zip": "69001"}},
]
df = pd.DataFrame(nested)
print(df['address'])
# 0    {'city': 'Paris', 'zip': '75001'}
# 1    {'city': 'Lyon',  'zip': '69001'}
# The 'address' column holds dict objects — use json_normalize() to expand them

The key difference from pd.read_json(): pd.DataFrame() requires a Python object, not a raw JSON string. It gives you more control and is easier to debug because you can inspect data before converting.

pd.json_normalize() for nested JSON

pd.json_normalize() is the right tool when JSON objects contain nested dicts you want as separate columns. It recursively flattens nested structures and joins key names with a separator (default: "."). This is the same as flattening nested JSON in Python but done in one pandas call.

import json
import pandas as pd
from pandas import json_normalize  # pandas 1.0+ preferred import

data = [
    {
        "id": 1,
        "name": "Alice",
        "address": {"city": "Paris", "zip": "75001"},
        "scores": [95, 87, 92],
    },
    {
        "id": 2,
        "name": "Bob",
        "address": {"city": "Lyon", "zip": "69001"},
        "scores": [78, 82, 80],
    },
]

# Flat 1-level nesting — dot-separated column names
df = json_normalize(data)
print(df.columns.tolist())
# ['id', 'name', 'scores', 'address.city', 'address.zip']

# Custom separator: underscore instead of dot
df = json_normalize(data, sep='_')
print(df.columns.tolist())
# ['id', 'name', 'scores', 'address_city', 'address_zip']

# Limit flattening depth (useful for deeply nested JSON)
df = json_normalize(data, sep='_', max_level=1)

# Expand a nested array using record_path
orders_data = [
    {"user_id": 1, "name": "Alice", "orders": [{"oid": 101, "total": 49.99}, {"oid": 102, "total": 19.99}]},
    {"user_id": 2, "name": "Bob",   "orders": [{"oid": 103, "total": 89.00}]},
]

df = json_normalize(
    orders_data,
    record_path='orders',          # expand this nested array
    meta=['user_id', 'name'],      # carry these parent fields into every row
)
print(df)
#     oid  total  user_id   name
# 0   101  49.99        1  Alice
# 1   102  19.99        1  Alice
# 2   103  89.00        2    Bob

Import note: from pandas import json_normalize is the current path (pandas 1.0+). The older path from pandas.io.json import json_normalize still works but is deprecated — update it if you see a FutureWarning.

NDJSON / JSON Lines: lines=True

NDJSON (Newline-Delimited JSON) stores one JSON object per line. It is the standard format for log files, Kafka exports, BigQuery extracts, and streaming APIs. Pass lines=True to pd.read_json() to read one row per line instead of treating the whole file as a single JSON array. See also reading JSON files in Python for the single-object case.

import pandas as pd

# events.ndjson content:
# {"ts":"2026-01-01T00:00:00Z","event":"login","user_id":1}
# {"ts":"2026-01-01T00:01:30Z","event":"click","user_id":1}
# {"ts":"2026-01-01T00:03:00Z","event":"logout","user_id":1}

# Read all lines into a DataFrame at once
df = pd.read_json('events.ndjson', lines=True)
print(df)
#                          ts   event  user_id
# 0  2026-01-01T00:00:00+00:00   login        1
# 1  2026-01-01T00:01:30+00:00   click        1
# 2  2026-01-01T00:03:00+00:00  logout        1

# Auto-parse the timestamp column
df = pd.read_json('events.ndjson', lines=True, convert_dates=['ts'])

# Process a large NDJSON file in chunks (memory-efficient)
results = []
for chunk in pd.read_json('big_events.ndjson', lines=True, chunksize=1000):
    # Filter or aggregate each chunk before accumulating
    filtered = chunk[chunk['event'] == 'login']
    results.append(filtered)

df_logins = pd.concat(results, ignore_index=True)
print(f"Total login events: {len(df_logins)}")

The chunksize parameter returns a JsonReader iterator instead of a DataFrame. Process each chunk independently to keep memory usage constant even for files that are gigabytes in size.

Handling nulls, missing keys, and type coercion

Inconsistent JSON — missing keys, null values, mixed types — is the most common source of problems when converting JSON to a DataFrame. Here is the complete toolkit for dealing with it, plus how to convert JSON to CSV at the end.

import json
import pandas as pd

data = [
    {"id": 1, "name": "Alice", "age": 30,   "email": "alice@example.com"},
    {"id": 2, "name": "Bob",   "age": None,  "email": None},
    {"id": 3, "name": "Carol"},  # missing age and email entirely
]

df = pd.DataFrame(data)
print(df)
#    id   name   age                 email
# 0   1  Alice  30.0  alice@example.com
# 1   2    Bob   NaN                 None
# 2   3  Carol   NaN                 None

# Check which columns have missing values
print(df.isnull().sum())
# id       0
# name     0
# age      2
# email    2

# Fill NaN with a default value
df['email'] = df['email'].fillna('unknown@example.com')
df['age']   = df['age'].fillna(0)

# Now safe to cast age to int (NaN forces float64; after filling it can be int)
df['age'] = df['age'].astype(int)

# Drop rows where id or name is null
df = df.dropna(subset=['id', 'name'])

# Enforce column types via read_json dtype parameter
json_str = json.dumps(data)
df = pd.read_json(json_str, dtype={'id': int, 'name': str})

# Export back to JSON (array of objects)
json_out = df.to_json(orient='records', indent=2)
print(json_out)

# Export to CSV (no row index in the file)
df.to_csv('output.csv', index=False)

Key rule: a column that contains even one NaN value cannot hold pure int64 — pandas upgrades it to float64 automatically. Always fill or drop NaN before calling .astype(int). Pandas 1.0+ also offers the nullable integer type pd.Int64Dtype() which can hold integers alongside pd.NA without converting to float.

Frequently asked questions

How do I convert a JSON string to a pandas DataFrame?

If the JSON is an array of objects (the most common API format), use pd.read_json(json_str) or first parse it with json.loads() and then call pd.DataFrame(data). Both return the same result. pd.read_json() accepts a JSON string, a file path, or a URL directly. For a JSON object (dict of dicts), use pd.DataFrame.from_dict(data, orient='index') to create one row per key.

What is the difference between pd.read_json() and pd.json_normalize()?

pd.read_json() reads a flat or moderately nested JSON structure into a DataFrame — it handles files, URLs, and strings but stores nested objects as Python dict objects in a single column. pd.json_normalize() recursively flattens nested objects into separate columns: {"address": {"city": "Paris"}} becomes an address.city column. Use pd.read_json() for flat JSON arrays, and pd.json_normalize() when your JSON has nested objects you want as separate columns.

How do I handle nested JSON in pandas?

Use pd.json_normalize(): from pandas import json_normalize; df = json_normalize(data, sep='_'). This flattens one level of nesting by default — {"user": {"id": 1, "name": "Alice"}} becomes user_id and user_name columns. For nested arrays (e.g., a "tags" list inside each object), use the record_path parameter: json_normalize(data, record_path='tags', meta=['user_id']) creates one row per tag with the parent user_id attached.

How do I read a JSON file with pandas?

df = pd.read_json('path/to/data.json'). For NDJSON/JSON Lines files (one JSON object per line): df = pd.read_json('events.ndjson', lines=True). For large files, use chunksize: for chunk in pd.read_json('big.ndjson', lines=True, chunksize=10000): process(chunk). If the file is an object (not an array), specify the orient: pd.read_json('data.json', orient='index').

How do I convert a pandas DataFrame back to JSON?

df.to_json(orient='records') produces an array of objects — the same format as the input for most API round-trips. Add indent=2 for pretty printing: df.to_json(orient='records', indent=2). Use orient='split' to preserve the pandas structure (index, columns, data). To write to a file: df.to_json('output.json', orient='records', indent=2). To also convert to CSV: df.to_csv('output.csv', index=False).

How do I handle missing values when converting JSON to DataFrame?

Missing keys in JSON objects automatically become NaN in pandas. Check which columns have missing values with df.isnull().sum(). Fill with a default: df['email'] = df['email'].fillna(''). To drop rows with any NaN: df.dropna(). To drop only rows where specific columns are null: df.dropna(subset=['id', 'name']). Note that NaN is a float value in pandas, so columns with mixed int/NaN become float64; convert after filling: df['age'] = df['age'].fillna(0).astype(int).

Format and validate your JSON

Paste your JSON into the Jsonic formatter to catch syntax errors before loading into pandas.

Open JSON Formatter