Convert JSON to SQL

Q: How do I handle nested JSON objects when converting to SQL?

You have 3 options: (1) Flatten — expand nested keys into flat columns ("address.city" becomes address_city column). (2) JSON column — store the entire nested object in a JSONB (PostgreSQL) or JSON (MySQL 5.7+) column and query it with operators like ->>. (3) Normalize — create a separate table for the nested object with a foreign key back to the parent. Use flattening for simple 1-level nesting, JSON columns when the nested schema varies, and normalization when you need referential integrity and frequent joins.

Q: What SQL data type should I use for JSON columns?

For PostgreSQL, use JSONB (binary JSON) over JSON (text JSON) — JSONB is indexed by GIN indexes, supports operators like @> (contains), -> (field access), and ->> (field as text), and automatically deduplicates object keys. For MySQL 5.7+, use the JSON column type which has similar functionality. For SQLite, there is no native JSON type — store as TEXT and use json_extract(column, '$.field') to query. Avoid storing JSON in VARCHAR in any database, as it prevents indexing and validation.

Q: How do I convert JSON to SQL in Python?

Use json.loads() to parse, then build INSERT statements. For safety, use psycopg2.extras.execute_values() for PostgreSQL (parameterized bulk insert) or sqlite3.executemany() for SQLite. Example: data = json.loads(json_str); cols = list(data[0].keys()); vals = [[row[c] for c in cols] for row in data]; execute_values(cursor, f"INSERT INTO users ({', '.join(cols)}) VALUES %s", vals). This approach handles all SQL types correctly (NULL, booleans, strings) without manual escaping.

Q: How do I import JSON data into PostgreSQL efficiently?

For up to 10,000 rows, use psycopg2.extras.execute_values() with batch_size=1000 — this groups rows into multi-row INSERT statements and is much faster than individual inserts. For 100,000+ rows, use PostgreSQL's COPY protocol: convert JSON to CSV in memory and use cursor.copy_expert() — COPY is 10-100x faster than INSERT because it bypasses WAL logging and constraint checking during load. For the largest datasets (millions of rows), use pg_bulkload or write directly to data files.

Q: What is the difference between JSON and JSONB in PostgreSQL?

JSON stores the original text of the JSON exactly (preserving whitespace, duplicate keys, and key order). JSONB parses and stores JSON in a binary format — duplicate keys are removed (last wins), key order is not preserved, and whitespace is stripped. JSONB is almost always preferable: it supports GIN indexing (enabling fast @> containment queries), operators like jsonb_path_query(), and is typically faster for read queries. Use JSON only when you need to preserve the exact original input (e.g., for auditing the raw value).

Converting JSON to SQL means transforming JSON objects (or arrays of objects) into SQL INSERT statements — each JSON object becomes one row, with keys mapping to column names and values to column data. A flat JSON array like [{"id":1,"name":"Alice","age":30}] maps directly to INSERT INTO users (id, name, age) VALUES (1, 'Alice', 30). The challenge is nested objects: an "address" object inside a user has 3 valid strategies — flatten to columns (address_city), store as a JSON column (JSONB in PostgreSQL), or normalize into a separate table. This guide covers the full conversion workflow with Python and JavaScript code, JSON-to-SQL type mapping, and importing large JSON datasets with PostgreSQL's bulk COPY command.

Visualize your JSON structure before converting to SQL.

Open JSON Formatter

How JSON maps to SQL: data types and rules

Each JSON object in an array becomes one SQL row. The JSON key becomes the column name and the JSON value is cast to the corresponding SQL type. The mapping is mostly straightforward for primitive types, but objects and arrays require a deliberate strategy (covered in section 4). Note that SQL column names must be quoted (with double-quotes in standard SQL, or backticks in MySQL) if they contain spaces or match reserved words like order, group, or select.

JSON Type	Example	SQL Type (PostgreSQL)	SQL Type (MySQL)	SQL Type (SQLite)
string	`"Alice"`	`TEXT` or `VARCHAR`	`VARCHAR(255)`	`TEXT`
number (integer)	`42`	`INTEGER`	`INT`	`INTEGER`
number (float)	`3.14`	`NUMERIC` or `FLOAT8`	`DECIMAL`	`REAL`
boolean	`true`	`BOOLEAN`	`TINYINT(1)`	`INTEGER` (0/1)
null	`null`	`NULL`	`NULL`	`NULL`
object (nested)	`{"city":"Paris"}`	`JSONB` (recommended)	`JSON`	`TEXT`
array	`[1,2,3]`	`JSONB` or `TEXT`	`JSON`	`TEXT`

A few important edge cases: JSON numbers have no integer/float distinction in the spec — inspect whether the value has a decimal point to decide between INTEGER and NUMERIC. JSON booleans (true/false) must be quoted as strings in MySQL since MySQL has no native boolean — use TINYINT(1) with 1/0. And in SQLite, the only JSON-compatible way to store structured data is as TEXT queried with json_extract().

Generate INSERT statements in Python

The simplest Python approach reads a JSON array with json.loads() and builds INSERT strings from the dict keys and values. For parsing JSON in Python generally, the standard library json module covers all common cases. The example below shows both a plain string-building function (good for understanding the mechanics) and the production-safe version using psycopg2.

import json

# ── Sample JSON data ──────────────────────────────────────────────────────────
json_str = '''[
  {"id": 1, "name": "Alice", "age": 30, "active": true,  "score": 9.5,  "notes": null},
  {"id": 2, "name": "Bob",   "age": 25, "active": false, "score": 7.25, "notes": "new user"},
  {"id": 3, "name": "Carol", "age": 35, "active": true,  "score": 8.0,  "notes": null}
]'''

data = json.loads(json_str)


# ── Helper: format a single value for SQL ─────────────────────────────────────
def sql_value(v):
    if v is None:
        return 'NULL'
    if isinstance(v, bool):
        return 'TRUE' if v else 'FALSE'   # bool must come before int (bool is int subclass)
    if isinstance(v, (int, float)):
        return str(v)
    if isinstance(v, (dict, list)):
        return "'" + json.dumps(v).replace("'", "''") + "'"
    # string: escape single quotes by doubling them
    return "'" + str(v).replace("'", "''") + "'"


# ── Build INSERT statements (illustrative — use parameterized in production) ──
def json_to_inserts(table: str, records: list[dict]) -> list[str]:
    if not records:
        return []
    cols = ', '.join(records[0].keys())
    stmts = []
    for row in records:
        vals = ', '.join(sql_value(v) for v in row.values())
        stmts.append(f"INSERT INTO {table} ({cols}) VALUES ({vals});")
    return stmts

for stmt in json_to_inserts('users', data):
    print(stmt)
# INSERT INTO users (id, name, age, active, score, notes) VALUES (1, 'Alice', 30, TRUE, 9.5, NULL);
# INSERT INTO users (id, name, age, active, score, notes) VALUES (2, 'Bob', 25, FALSE, 7.25, 'new user');
# INSERT INTO users (id, name, age, active, score, notes) VALUES (3, 'Carol', 35, TRUE, 8.0, NULL);


# ── PRODUCTION: psycopg2 parameterized bulk INSERT (PostgreSQL) ───────────────
# NEVER use string interpolation with user-supplied data — use execute_values().
import psycopg2
import psycopg2.extras

conn = psycopg2.connect("dbname=mydb user=postgres host=localhost")
cur  = conn.cursor()

cols = list(data[0].keys())
col_clause = ', '.join(cols)
rows = [[row[c] for c in cols] for row in data]

psycopg2.extras.execute_values(
    cur,
    f"INSERT INTO users ({col_clause}) VALUES %s",
    rows,
    page_size=500   # rows per statement; tune between 100–1000
)
conn.commit()
cur.close()
conn.close()


# ── PRODUCTION: sqlite3 parameterized INSERT ──────────────────────────────────
import sqlite3

conn = sqlite3.connect('users.db')
cur  = conn.cursor()

cols = list(data[0].keys())
placeholders = ', '.join('?' * len(cols))
col_clause   = ', '.join(cols)
rows = [tuple(row[c] for c in cols) for row in data]

cur.executemany(f"INSERT INTO users ({col_clause}) VALUES ({placeholders})", rows)
conn.commit()
conn.close()

The sql_value() helper handles the three trickiest cases: Python None → NULL, Python bool → TRUE/FALSE (note that bool must be checked before int in Python because bool is a subclass of int), and nested dicts/lists → JSON strings. For any real application, prefer the execute_values pattern — it passes values as parameters and lets the database driver handle all escaping.

Generate INSERT statements in JavaScript/Node.js

In JavaScript, JSON.parse() turns the JSON string into an array of plain objects. You then use Object.keys() and Object.values() to extract columns and values per row. The dialect difference that catches people: PostgreSQL uses $1, $2, … placeholders while MySQL and SQLite use ?.

// ── Helper: format a single value for SQL (for generating plain SQL strings) ──
function sqlValue(v) {
  if (v === null || v === undefined) return 'NULL';
  if (typeof v === 'boolean')        return v ? 'TRUE' : 'FALSE';
  if (typeof v === 'number')         return String(v);
  if (typeof v === 'object')         return "'" + JSON.stringify(v).replace(/'/g, "''") + "'";
  // string
  return "'" + String(v).replace(/'/g, "''") + "'";
}

// ── Build INSERT statements ───────────────────────────────────────────────────
function jsonToInserts(table, records) {
  if (!records.length) return [];
  const cols = Object.keys(records[0]);
  return records.map(row => {
    const vals = cols.map(c => sqlValue(row[c])).join(', ');
    return `INSERT INTO ${table} (${cols.join(', ')}) VALUES (${vals});`;
  });
}

const data = [
  { id: 1, name: "Alice", age: 30, active: true,  score: 9.5,  notes: null        },
  { id: 2, name: "Bob",   age: 25, active: false, score: 7.25, notes: "new user"  },
  { id: 3, name: "Carol", age: 35, active: true,  score: 8.0,  notes: null        },
];

jsonToInserts('users', data).forEach(s => console.log(s));
// INSERT INTO users (id, name, age, active, score, notes) VALUES (1, 'Alice', 30, TRUE, 9.5, NULL);


// ── PRODUCTION: node-postgres (pg) with parameterized queries ─────────────────
import pg from 'pg';
const { Pool } = pg;
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

async function insertJson(table, records) {
  if (!records.length) return;
  const cols = Object.keys(records[0]);
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    for (const row of records) {
      // PostgreSQL uses $1, $2, ... placeholders
      const placeholders = cols.map((_, i) => `$${i + 1}`).join(', ');
      const vals = cols.map(c => row[c] === undefined ? null : row[c]);
      await client.query(
        `INSERT INTO ${table} (${cols.join(', ')}) VALUES (${placeholders})`,
        vals
      );
    }
    await client.query('COMMIT');
  } catch (err) {
    await client.query('ROLLBACK');
    throw err;
  } finally {
    client.release();
  }
}

await insertJson('users', data);


// ── PRODUCTION: better-sqlite3 (SQLite, Node.js) ─────────────────────────────
import Database from 'better-sqlite3';
const db = new Database('users.db');

function insertJsonSqlite(table, records) {
  if (!records.length) return;
  const cols = Object.keys(records[0]);
  // SQLite uses ? placeholders
  const placeholders = cols.map(() => '?').join(', ');
  const stmt = db.prepare(`INSERT INTO ${table} (${cols.join(', ')}) VALUES (${placeholders})`);
  const insertMany = db.transaction(rows => {
    for (const row of rows) stmt.run(cols.map(c => row[c] ?? null));
  });
  insertMany(records);
}

insertJsonSqlite('users', data);

The key difference from Python: JavaScript has no None — check for both null and undefined. The ?? operator (row[c] ?? null) collapses both to null, which the database driver maps to SQL NULL. For MySQL, swap the PostgreSQL $1 placeholders for ? and use the mysql2 package's pool.execute() method.

Three strategies for nested JSON objects

A nested JSON object like {"address": {"city": "Paris", "zip": "75001"}} cannot be stored directly as a scalar SQL value. There are exactly three strategies, each with different trade-offs. Which to pick depends on how you query the data.

Strategy 1: Flatten — expand nested keys into top-level columns

-- Input JSON:
-- {"id": 1, "name": "Alice", "address": {"city": "Paris", "zip": "75001"}}

-- Flatten: nested keys become address_city, address_zip
CREATE TABLE users (
  id           INTEGER PRIMARY KEY,
  name         TEXT,
  address_city TEXT,
  address_zip  TEXT
);

INSERT INTO users (id, name, address_city, address_zip)
VALUES (1, 'Alice', 'Paris', '75001');

-- Query: no JSON operators needed
SELECT name FROM users WHERE address_city = 'Paris';

# Python: flatten a nested dict before building INSERT
def flatten(obj, prefix=''):
    """Recursively flatten a dict, joining keys with underscore."""
    result = {}
    for k, v in obj.items():
        key = f"{prefix}{k}" if prefix else k
        if isinstance(v, dict):
            result.update(flatten(v, prefix=f"{key}_"))
        else:
            result[key] = v
    return result

row = {"id": 1, "name": "Alice", "address": {"city": "Paris", "zip": "75001"}}
flat = flatten(row)
# {"id": 1, "name": "Alice", "address_city": "Paris", "address_zip": "75001"}

Strategy 2: JSON column — store nested object as JSONB

-- PostgreSQL: store address as a JSONB column
CREATE TABLE users (
  id      INTEGER PRIMARY KEY,
  name    TEXT,
  address JSONB
);

INSERT INTO users (id, name, address)
VALUES (1, 'Alice', '{"city": "Paris", "zip": "75001"}');

-- Query with ->> (field as text) and @> (containment)
SELECT name FROM users WHERE address->>'city' = 'Paris';
SELECT name FROM users WHERE address @> '{"zip": "75001"}';

-- Add a GIN index to speed up @> containment queries
CREATE INDEX idx_users_address ON users USING GIN (address);

Strategy 3: Normalize — separate table with foreign key

-- Normalized schema: users + addresses tables
CREATE TABLE users (
  id   INTEGER PRIMARY KEY,
  name TEXT
);

CREATE TABLE addresses (
  id      INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  user_id INTEGER REFERENCES users(id),
  city    TEXT,
  zip     TEXT
);

INSERT INTO users   (id, name)           VALUES (1, 'Alice');
INSERT INTO addresses (user_id, city, zip) VALUES (1, 'Paris', '75001');

-- Query: JOIN required every time
SELECT u.name, a.city
FROM users u
JOIN addresses a ON a.user_id = u.id
WHERE a.city = 'Paris';

Use the decision table below to choose a strategy:

Strategy	Best for	Query complexity	Schema flexibility	Downside
Flatten	1-level nesting, frequently filtered columns	Simple — no operators	Low — schema is fixed	Column explosion for deep nesting
JSONB column	Flexible/varying nested schemas	Medium — uses `->>`, `@>`	High — schema can vary per row	Less readable SQL; no FK constraints
Normalize	Frequently joined, relational data	Complex — requires JOIN	Low — schema is fixed	More tables; slower writes

Import large JSON files into PostgreSQL

Individual INSERT statements are slow for large datasets — each round trip to the database adds latency, and PostgreSQL must flush the WAL after every write. For 100K+ rows, skip individual inserts entirely and use one of the two bulk-load approaches below. For JSON-to-CSV conversion as an intermediate step, see convert JSON to CSV.

Approach 1: execute_values() — batch INSERT (up to ~50K rows)

import json
import psycopg2
import psycopg2.extras

with open('users.json') as f:
    data = json.load(f)

conn = psycopg2.connect("dbname=mydb user=postgres")
cur  = conn.cursor()

cols = list(data[0].keys())
col_clause = ', '.join(cols)
rows = [[row.get(c) for c in cols] for row in data]

# execute_values groups rows into multi-row statements (page_size rows each).
# Much faster than looping execute() — one round-trip per page_size rows.
psycopg2.extras.execute_values(
    cur,
    f"INSERT INTO users ({col_clause}) VALUES %s ON CONFLICT (id) DO NOTHING",
    rows,
    page_size=1000
)

conn.commit()
print(f"Inserted {len(data)} rows")

Approach 2: COPY protocol — fastest (100K+ rows)

import json
import csv
import io
import psycopg2

with open('users.json') as f:
    data = json.load(f)

conn = psycopg2.connect("dbname=mydb user=postgres")
cur  = conn.cursor()

cols = list(data[0].keys())

# Convert JSON to CSV in memory — no temp file needed
buf = io.StringIO()
writer = csv.writer(buf, quoting=csv.QUOTE_MINIMAL)
for row in data:
    writer.writerow([
        '' if row.get(c) is None else json.dumps(row[c]) if isinstance(row.get(c), (dict, list)) else row.get(c)
        for c in cols
    ])
buf.seek(0)

# COPY is 10-100x faster than INSERT — bypasses WAL and constraint checks during load
cur.copy_expert(
    f"COPY users ({', '.join(cols)}) FROM STDIN WITH (FORMAT CSV, NULL '')",
    buf
)

conn.commit()
cur.close()
conn.close()
print(f"COPY loaded {len(data)} rows")


# ── Alternative: \copy in psql (no Python needed) ────────────────────────────
# 1. Convert JSON to CSV first (e.g. with jq or Python csv module)
# 2. Then in psql:
#    \copy users (id, name, age) FROM '/tmp/users.csv' WITH (FORMAT CSV, HEADER)
#
# For MongoDB → PostgreSQL migration:
# mongoexport --collection=users --type=json --out=users.json
# Then use the Python COPY pattern above.

The COPY protocol bypasses per-row WAL logging and constraint checking during load, which is why it achieves 10–100× the throughput of individual INSERTs. Note: if you need the loaded data to immediately satisfy foreign-key constraints, defer constraint checking with SET CONSTRAINTS ALL DEFERRED within the transaction, then re-enable after the COPY.

Frequently asked questions

How do I convert a JSON array to SQL INSERT statements?

Each object in the JSON array becomes one INSERT row. The pattern is: (1) parse the JSON with json.loads() (Python) or JSON.parse() (JavaScript), (2) get column names from the first object's keys, (3) generate INSERT INTO table (col1, col2) VALUES (val1, val2) for each object. In Python: cols = ', '.join(obj.keys()); vals = ', '.join(repr(v) if v is not None else 'NULL' for v in obj.values()). For production code, use parameterized queries (%s with psycopg2, ? with sqlite3) instead of string interpolation to prevent SQL injection. Check out our JSON examples for more sample data to test against.

How do I handle nested JSON objects when converting to SQL?

You have 3 options: (1) Flatten — expand nested keys into flat columns ("address.city" → address_city column). (2) JSON column — store the entire nested object in a JSONB (PostgreSQL) or JSON (MySQL 5.7+) column and query it with operators like ->>. (3) Normalize — create a separate table for the nested object with a foreign key back to the parent. Use flattening for simple 1-level nesting, JSON columns when the nested schema varies, and normalization when you need referential integrity and frequent joins. See the decision table in section 4 above for a quick reference.

What SQL data type should I use for JSON columns?

For PostgreSQL, use JSONB (binary JSON) over JSON (text JSON) — JSONB is indexed by GIN indexes, supports operators like @> (contains), -> (field access), and ->> (field as text), and automatically deduplicates object keys. For MySQL 5.7+, use the JSON column type which has similar functionality. For SQLite, there is no native JSON type — store as TEXT and use json_extract(column, '$.field') to query. Avoid storing JSON in VARCHAR in any database, as it prevents indexing and validation.

How do I convert JSON to SQL in Python?

Use json.loads() to parse, then build INSERT statements. For safety, use psycopg2.extras.execute_values() for PostgreSQL (parameterized bulk insert) or sqlite3.executemany() for SQLite. Example: data = json.loads(json_str); cols = list(data[0].keys()); vals = [[row[c] for c in cols] for row in data]; execute_values(cursor, f"INSERT INTO users ({', '.join(cols)}) VALUES %s", vals). This approach handles all SQL types correctly (NULL, booleans, strings) without manual escaping. See also parse JSON in Python for the full json module API.

How do I import JSON data into PostgreSQL efficiently?

For up to 10,000 rows, use psycopg2.extras.execute_values() with batch_size=1000 — this groups rows into multi-row INSERT statements and is much faster than individual inserts. For 100,000+ rows, use PostgreSQL's COPY protocol: convert JSON to CSV in memory and use cursor.copy_expert() — COPY is 10–100× faster than INSERT because it bypasses WAL logging and constraint checking during load. For the largest datasets (millions of rows), use pg_bulkload or write directly to data files. You can also use our JSON to CSV converter tool to produce a CSV suitable for \copy in psql.

What is the difference between JSON and JSONB in PostgreSQL?

JSON stores the original text of the JSON exactly (preserving whitespace, duplicate keys, and key order). JSONB parses and stores JSON in a binary format — duplicate keys are removed (last wins), key order is not preserved, and whitespace is stripped. JSONB is almost always preferable: it supports GIN indexing (enabling fast @> containment queries), operators like jsonb_path_query(), and is typically faster for read queries. Use JSON only when you need to preserve the exact original input (e.g., for auditing the raw value). The storage overhead of JSONB is slightly larger, but the query performance gains far outweigh it in practice.

Ready to work with your JSON?

Use Jsonic to format and validate your JSON before converting to SQL — catch syntax errors before they become database errors. You can also convert JSON to CSV as an intermediate step for bulk \copy imports.

Open JSON Formatter