JSON Streaming: NDJSON, Large File Parsing & ReadableStream
Last updated:
JSON streaming avoids loading the entire document into memory by processing records one at a time — NDJSON (Newline-Delimited JSON, also called JSON Lines) stores one JSON object per line, enabling O(1) memory parsing of files with millions of records. JSON.parse() requires the full string in memory; streaming parsers (Node.js stream/json or oboe.js) emit events as they encounter each object, capping memory at the size of one record. A 1 GB NDJSON file can be processed with ~1 MB peak memory using a streaming parser. This guide covers NDJSON format and tools, Node.js readline for NDJSON parsing, Python ijson for streaming JSON, streaming HTTP responses with fetch() ReadableStream, React Server Component streaming with Suspense, and jq for streaming command-line JSON.
NDJSON Format: One Object Per Line
NDJSON (Newline-Delimited JSON) is the simplest streaming JSON format: each line is a complete, independently valid JSON value separated by a newline character (\n). There is no wrapping array, no comma between records, and no document-level structure — each line stands alone. This makes it trivially streamable: a parser reads one line, calls JSON.parse(), processes the result, and discards it before reading the next line.
# data.ndjson — each line is one complete JSON object
{"id":1,"name":"Alice","email":"alice@example.com","score":92}
{"id":2,"name":"Bob","email":"bob@example.com","score":87}
{"id":3,"name":"Carol","email":"carol@example.com","score":95}
# Rules:
# ✓ One JSON value per line (object, array, string, number — any valid JSON)
# ✓ Lines separated by \n (LF) — \r\n (CRLF) is also accepted by most parsers
# ✗ No trailing comma after each line
# ✗ No wrapping [ ] array brackets
# ✗ No comma between records
# Compare: standard JSON array (requires full buffer before parsing)
[
{"id":1,"name":"Alice","score":92},
{"id":2,"name":"Bob","score":87}
]
# Compare: NDJSON (each line parseable immediately, O(1) memory)
{"id":1,"name":"Alice","score":92}
{"id":2,"name":"Bob","score":87}
# MIME types in use:
# application/x-ndjson — most common, registered with IANA
# application/jsonl — JSON Lines variant
# application/x-jsonlines
# File extensions:
# .ndjson — NDJSON
# .jsonl — JSON Lines / JSONL (common in ML datasets)
# Generating NDJSON with jq from a JSON array:
# jq -c '.[]' input.json > output.ndjson
# Converting NDJSON back to a JSON array:
# jq -s '.' input.ndjson > output.jsonNDJSON is the de facto format for log files, data pipeline exports, Elasticsearch bulk imports, and streaming API responses. Elasticsearch's Bulk API requires NDJSON: each document is preceded by an action line ({"index":{"_id":"1"}}), making pairs of NDJSON lines. Machine learning tooling (Hugging Face datasets, Apache Spark, pandas) uses the .jsonl extension but the identical format. The key constraint: individual JSON objects must not contain unescaped newlines — string values with literal newlines must escape them as \n. Most JSON.stringify() and json.dumps() implementations do this correctly by default.
Node.js readline: Parsing NDJSON Files
Node.js's built-in readline module is the standard tool for NDJSON parsing — no npm dependencies required. readline.createInterface wraps a readable stream and emits one line event per newline-terminated chunk. Combined with fs.createReadStream, it reads the file incrementally from disk, never buffering more than one line at a time in the JSON parsing step.
// parse-ndjson.mjs — Node.js built-in readline, no dependencies
import readline from 'node:readline'
import fs from 'node:fs'
import { performance } from 'node:perf_hooks'
async function parseNdjson(filePath) {
const start = performance.now()
let count = 0
const rl = readline.createInterface({
input: fs.createReadStream(filePath, { encoding: 'utf8' }),
crlfDelay: Infinity, // handle \r\n line endings
})
for await (const line of rl) {
const trimmed = line.trim()
if (!trimmed) continue // skip blank lines
try {
const record = JSON.parse(trimmed)
// Process one record at a time — O(1) memory
processRecord(record)
count++
} catch (err) {
console.error(`Parse error on line ${count + 1}: ${err.message}`)
}
}
const elapsed = (performance.now() - start).toFixed(0)
console.log(`Processed ${count} records in ${elapsed}ms`)
}
function processRecord(record) {
// Replace with your business logic
console.log(record.id, record.name)
}
await parseNdjson('./data.ndjson')
// ── Streaming NDJSON with backpressure (Transform stream) ─────────
import { Transform } from 'node:stream'
import { pipeline } from 'node:stream/promises'
class NdjsonParser extends Transform {
constructor() {
super({ objectMode: true })
this._buffer = ''
}
_transform(chunk, _enc, callback) {
this._buffer += chunk.toString()
const lines = this._buffer.split('\n')
this._buffer = lines.pop() // keep incomplete last line
for (const line of lines) {
const trimmed = line.trim()
if (trimmed) {
try {
this.push(JSON.parse(trimmed))
} catch (err) {
this.emit('error', err)
}
}
}
callback()
}
_flush(callback) {
if (this._buffer.trim()) {
try {
this.push(JSON.parse(this._buffer.trim()))
} catch (err) {
this.emit('error', err)
}
}
callback()
}
}
// Use the Transform stream in a pipeline
const parser = new NdjsonParser()
parser.on('data', (record) => processRecord(record))
await pipeline(
fs.createReadStream('./data.ndjson'),
parser
)
// ── stream-json npm package: parse standard JSON arrays ───────────
// npm install stream-json
import { parser } from 'stream-json'
import { streamArray } from 'stream-json/streamers/StreamArray.js'
await pipeline(
fs.createReadStream('./large-array.json'),
parser(),
streamArray(),
async function* (source) {
for await (const { key, value } of source) {
processRecord(value)
}
}
)The for await...of loop over the readline interface uses async iteration, which naturally applies backpressure — the loop body must complete before reading the next line. If processRecord is async (database writes, HTTP calls), await it inside the loop: for await (const line of rl) { await processRecord(JSON.parse(line)) }. This prevents memory accumulation from unprocessed records queuing up. For very high-throughput scenarios, use a worker pool with node:worker_threads to parallelize JSON parsing across CPU cores, passing raw line strings to workers and collecting results via MessageChannel. For standard JSON arrays (not NDJSON), the stream-json npm package's StreamArray class provides the same line-by-line semantics.
Python ijson: Streaming Large JSON Files
Python's standard json.load() reads the entire file into memory — unsuitable for files larger than available RAM. The ijson library provides incremental JSON parsing using a SAX-like event model, exposing a high-level items() iterator that yields complete Python objects for a specified path in the document, without loading siblings or ancestors.
# pip install ijson
import ijson
import json
from pathlib import Path
# ── Parse a top-level JSON array incrementally ────────────────────
# File: large.json = [{"id":1,...}, {"id":2,...}, ...]
# "item" prefix targets each element of the top-level array
def process_large_json_array(filepath: str):
with open(filepath, "rb") as f: # open in binary mode for ijson
for item in ijson.items(f, "item"):
# item is one fully parsed Python dict — O(1) memory
process_record(item)
process_large_json_array("large.json")
# ── Nested array: data.records[*] ────────────────────────────────
# File: {"meta": {...}, "data": {"records": [...]}}
# Prefix "data.records.item" targets each element of records array
def process_nested_array(filepath: str):
with open(filepath, "rb") as f:
for record in ijson.items(f, "data.records.item"):
process_record(record)
# ── Parse NDJSON in Python (no ijson needed) ─────────────────────
def parse_ndjson(filepath: str):
with open(filepath, "r", encoding="utf-8") as f:
for line_num, line in enumerate(f, start=1):
stripped = line.strip()
if not stripped:
continue # skip blank lines
try:
record = json.loads(stripped)
process_record(record)
except json.JSONDecodeError as e:
print(f"Parse error on line {line_num}: {e}")
parse_ndjson("data.ndjson")
# ── ijson with multiple backends ─────────────────────────────────
# ijson selects the fastest available backend automatically:
# 1. yajl2_c — C extension wrapping yajl2 (fastest)
# 2. yajl2_cffi — CFFI wrapper for yajl2
# 3. python — pure Python fallback
# Force a specific backend:
import ijson.backends.yajl2_c as ijson_c
# or check what's available:
print(ijson.backend) # "yajl2_c" if C extension installed
# ── ijson event-level API for fine-grained control ───────────────
def process_with_events(filepath: str):
with open(filepath, "rb") as f:
parser = ijson.parse(f)
for prefix, event, value in parser:
if (prefix, event) == ("item.name", "string"):
print(f"Found name: {value}")
elif (prefix, event) == ("item.score", "number"):
print(f"Found score: {value}")
# ── Memory comparison ─────────────────────────────────────────────
import tracemalloc
tracemalloc.start()
# BAD: loads entire file
with open("large.json") as f:
data = json.load(f) # peak RAM = file size
current, peak = tracemalloc.get_traced_memory()
print(f"json.load peak: {peak / 1024 / 1024:.1f} MB")
tracemalloc.reset_peak()
# GOOD: streaming with ijson
with open("large.json", "rb") as f:
for item in ijson.items(f, "item"):
pass # process one at a time
current, peak = tracemalloc.get_traced_memory()
print(f"ijson peak: {peak / 1024:.1f} KB") # ~constant regardless of file sizeInstall the C extension for maximum performance: pip install ijson on most platforms automatically installs the yajl2 C backend, which parses at ~200-400 MB/s compared to ~30-50 MB/s for the pure Python fallback. Verify the backend with import ijson; print(ijson.backend). For NDJSON files in Python, the standard library is sufficient — iterate lines with a plain for line in f: loop and call json.loads(line) on each. The jsonlines pip package wraps this pattern with error handling and NDJSON write support. See also the guide on Python JSON parsing for standard json module usage.
HTTP Streaming Responses with ReadableStream
The Fetch API's response.body is a ReadableStream — calling response.json() discards this advantage by buffering the entire body. Reading response.body directly lets you process NDJSON chunks as they arrive over the network, cutting time-to-first-record from total response time to the latency of the first network packet.
// ── Browser / Deno / Node.js 18+: fetch() with ReadableStream ────
async function fetchNdjsonStream(url) {
const response = await fetch(url, {
headers: { Accept: 'application/x-ndjson' },
})
if (!response.ok) throw new Error(`HTTP ${response.status}`)
const reader = response.body
.pipeThrough(new TextDecoderStream())
.getReader()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) {
// Flush any remaining buffered content
if (buffer.trim()) {
processRecord(JSON.parse(buffer.trim()))
}
break
}
buffer += value
const lines = buffer.split('\n')
buffer = lines.pop() // keep incomplete last line in buffer
for (const line of lines) {
if (line.trim()) {
processRecord(JSON.parse(line.trim()))
}
}
}
}
// ── Express: streaming NDJSON response ───────────────────────────
// Server side: write one JSON object per line
import express from 'express'
import { Readable } from 'node:stream'
const app = express()
app.get('/api/records/stream', async (req, res) => {
res.setHeader('Content-Type', 'application/x-ndjson')
res.setHeader('Transfer-Encoding', 'chunked')
// Disable compression middleware — gzip buffers defeat streaming
res.setHeader('Cache-Control', 'no-cache')
const cursor = db.collection('records').find({}).batchSize(100)
try {
for await (const doc of cursor) {
const line = JSON.stringify(doc) + '\n'
const ok = res.write(line)
if (!ok) {
// Backpressure: wait for drain before writing more
await new Promise(resolve => res.once('drain', resolve))
}
}
res.end()
} catch (err) {
res.destroy(err)
}
})
// ── Next.js / Edge Runtime: Response with ReadableStream ──────────
// app/api/stream/route.ts
export async function GET() {
const encoder = new TextEncoder()
const stream = new ReadableStream({
async start(controller) {
const records = await fetchRecordsFromDb()
for (const record of records) {
const line = JSON.stringify(record) + '\n'
controller.enqueue(encoder.encode(line))
// Yield control to allow chunks to be sent
await new Promise(resolve => setTimeout(resolve, 0))
}
controller.close()
},
})
return new Response(stream, {
headers: {
'Content-Type': 'application/x-ndjson',
'Transfer-Encoding': 'chunked',
},
})
}Backpressure is critical on the server side: res.write() returns false when the internal buffer is full (client consuming slower than server producing). Check the return value and await the drain event before writing more data — omitting this causes unbounded memory growth as records queue in the Node.js write buffer. On the client side, the ReadableStream reader automatically applies backpressure by not calling reader.read() until the current chunk is processed. For JSON in Edge Functions, the Web Streams API (ReadableStream, TransformStream) is the only available streaming primitive — Node.js streams are not available in edge runtimes. See the Express.js JSON API guide for complete Express streaming patterns.
React Server Components: Suspense and JSON Streaming
React Server Components (RSC) with Suspense enable HTML streaming — the server sends the initial shell immediately and streams deferred UI sections as their data resolves. This is a different kind of streaming than NDJSON: instead of streaming raw JSON, the server streams rendered HTML segments that include their data. The net effect for users is the same: visible content appears progressively rather than after a full page load.
// app/dashboard/page.tsx — Next.js App Router streaming with Suspense
import { Suspense } from 'react'
// This component fetches data — it's async, runs on the server
async function UserStats() {
// fetch() in RSC is automatically deduped and cached
const res = await fetch('https://api.example.com/stats', {
next: { revalidate: 60 }, // ISR: revalidate every 60 seconds
})
const data = await res.json()
return (
<div>
<p>Total users: {data.total}</p>
<p>Active today: {data.activeToday}</p>
</div>
)
}
async function RecentOrders() {
// Slow query — streamed independently from UserStats
const res = await fetch('https://api.example.com/orders/recent')
const orders = await res.json()
return (
<ul>
{orders.map((order: { id: number; product: string }) => (
<li key={order.id}>{order.product}</li>
))}
</ul>
)
}
export default function DashboardPage() {
return (
<main>
<h1>Dashboard</h1>
{/* UserStats streams independently — shell renders immediately */}
<Suspense fallback={<p>Loading stats...</p>}>
<UserStats />
</Suspense>
{/* RecentOrders streams independently — does not block UserStats */}
<Suspense fallback={<p>Loading orders...</p>}>
<RecentOrders />
</Suspense>
</main>
)
}
// ── Streaming JSON from an RSC to a Client Component ─────────────
// For actual JSON data streaming (not HTML), use a Route Handler:
// app/api/stream-data/route.ts
export const dynamic = 'force-dynamic'
export const runtime = 'nodejs' // or 'edge'
export async function GET() {
const encoder = new TextEncoder()
let cancelled = false
const stream = new ReadableStream({
async start(controller) {
const cursor = db.collection('events').find({})
for await (const doc of cursor) {
if (cancelled) break
controller.enqueue(encoder.encode(JSON.stringify(doc) + '\n'))
}
controller.close()
},
cancel() {
cancelled = true // client disconnected
},
})
return new Response(stream, {
headers: { 'Content-Type': 'application/x-ndjson' },
})
}
// ── Client Component: consume the NDJSON stream ───────────────────
'use client'
import { useEffect, useState } from 'react'
export function LiveFeed() {
const [records, setRecords] = useState<unknown[]>([])
useEffect(() => {
let cancelled = false
async function consumeStream() {
const res = await fetch('/api/stream-data')
const reader = res.body!.pipeThrough(new TextDecoderStream()).getReader()
let buffer = ''
while (!cancelled) {
const { done, value } = await reader.read()
if (done) break
buffer += value
const lines = buffer.split('\n')
buffer = lines.pop() ?? ''
for (const line of lines) {
if (line.trim()) {
setRecords(prev => [...prev, JSON.parse(line)])
}
}
}
}
consumeStream()
return () => { cancelled = true }
}, [])
return <ul>{records.map((r, i) => <li key={i}>{JSON.stringify(r)}</li>)}</ul>
}React's Suspense streaming works at the HTML rendering layer — each <Suspense> boundary becomes an independently streamed chunk. The browser renders the page shell immediately and progressively replaces loading fallbacks with actual content as server data resolves. This is distinct from NDJSON streaming: RSC streaming is always HTML-over-HTTP/2, while NDJSON is raw JSON-over-HTTP. For live data feeds (real-time dashboards, log tails), use a Client Component with useEffect consuming a NDJSON Route Handler — RSC components are server-rendered once and cannot update in response to new data without a re-render. For JSON performance optimization in RSC, prefer fetch() with next.revalidate for cacheable data.
jq for Streaming Command-Line JSON Processing
jq is a command-line JSON processor that reads JSON input, applies a filter expression, and writes JSON output. For large files, jq processes input as a stream by default for many operations — it does not load the entire file when using --stream or when operating on NDJSON input. jq is the fastest way to extract, transform, and filter JSON data without writing code.
# ── jq basics: filter fields from NDJSON ────────────────────────
# NDJSON input: one object per line — jq processes each independently
jq '{id, name}' data.ndjson
# outputs one compact JSON object per line (same line count as input)
# Extract a single field as raw string (not JSON-quoted)
jq -r '.email' data.ndjson
# Filter records matching a condition
jq 'select(.score > 90)' data.ndjson
# ── -c flag: compact output (one JSON object per line) ───────────
# Always use -c for large outputs to avoid per-record pretty-printing
jq -c '.[] | {id, name, email}' large-array.json
# ── --stream: true streaming for large standard JSON files ────────
# --stream emits low-level path/value events; fromstream reconstructs objects
# Process top-level array elements one at a time without loading full file:
jq --stream -n 'fromstream(1|truncate_stream(inputs; 1))' large-array.json
# --stream with a filter (extract only high-score records):
jq --stream -cn '
fromstream(1|truncate_stream(inputs; 1))
| select(.score > 90)
' large-array.json
# ── Convert standard JSON array to NDJSON ───────────────────────
jq -c '.[]' input.json > output.ndjson
# ── Convert NDJSON to standard JSON array ───────────────────────
jq -s '.' input.ndjson > output.json
# ── Aggregate/count without --stream (loads full file) ───────────
# For aggregations, jq must load the file — unavoidable for reductions
jq '[.[] | select(.active == true)] | length' data.json
# For large files, use a different tool for aggregations:
# awk -F'"score":' '{sum += $2; count++} END {print sum/count}' data.ndjson
# ── Practical streaming patterns ─────────────────────────────────
# Extract emails from 10M record NDJSON (streaming, low memory):
jq -r '.email' users.ndjson > emails.txt
# Join two NDJSON files (requires both in memory — use only for small files):
jq -s 'map(select(.type == "user"))' a.ndjson b.ndjson
# Stream from an HTTP API and process on the fly:
curl -s 'https://api.example.com/stream' | jq -c 'select(.level == "error")'
# Format only first 100 records from a large NDJSON file:
head -n 100 large.ndjson | jq '.'
# Count records in an NDJSON file (no jq needed — wc is faster):
wc -l data.ndjson
# ── jq streaming performance tips ────────────────────────────────
# 1. Use -c (compact) to avoid pretty-printing overhead
# 2. Use -r for raw string output (no JSON quoting)
# 3. Prefer NDJSON input over --stream for simple per-record operations
# 4. For aggregations (sum, avg, group-by), use awk or Python — jq buffers
# 5. Pipe curl directly to jq — avoids writing temp filesjq's --stream mode emits three kinds of low-level events: [path, value] for leaf nodes, [path] for truncated values, and a newline at the end of each complete value. The fromstream and truncate_stream builtins reconstruct complete objects from these events. For simple per-record operations on NDJSON (extract field, filter by condition, transform), use plain jq without --stream — jq processes NDJSON naturally line by line. Reserve --stream for standard JSON arrays that are too large to fit in memory. For bulk data transformations, consider mlr (Miller) as an alternative: it natively understands NDJSON/JSONL and offers SQL-like GROUP BY, JOIN, and aggregation without loading entire files.
Generating NDJSON: Writing Streaming JSON Producers
Producing NDJSON is simpler than parsing it: serialize each record to a JSON string, append \n, and write it. The challenge is doing this efficiently at scale — avoiding string concatenation in a loop, flushing output incrementally, and handling errors mid-stream without corrupting the output.
// ── Node.js: write NDJSON to a file ──────────────────────────────
import fs from 'node:fs'
import { pipeline } from 'node:stream/promises'
import { Readable, Transform } from 'node:stream'
async function writeNdjson(records: Iterable<unknown>, outputPath: string) {
const writeStream = fs.createWriteStream(outputPath, { encoding: 'utf8' })
for (const record of records) {
const line = JSON.stringify(record) + '\n'
const ok = writeStream.write(line)
if (!ok) {
// Wait for drain — respects backpressure
await new Promise(resolve => writeStream.once('drain', resolve))
}
}
await new Promise<void>((resolve, reject) => {
writeStream.end(resolve)
writeStream.once('error', reject)
})
}
// ── Node.js: stream from async generator to NDJSON file ──────────
async function* fetchPagedRecords(db: unknown) {
let page = 0
while (true) {
// @ts-ignore
const rows = await db.query('SELECT * FROM events LIMIT 1000 OFFSET ?', [page * 1000])
if (rows.length === 0) break
yield* rows
page++
}
}
async function exportNdjson(db: unknown, outputPath: string) {
const out = fs.createWriteStream(outputPath)
for await (const record of fetchPagedRecords(db)) {
out.write(JSON.stringify(record) + '\n')
}
await new Promise<void>(resolve => out.end(resolve))
console.log('Export complete:', outputPath)
}
// ── Python: write NDJSON ──────────────────────────────────────────
# Python — write NDJSON line by line
import json
def write_ndjson(records, filepath):
with open(filepath, "w", encoding="utf-8") as f:
for record in records:
# json.dumps never adds a trailing newline — append manually
line = json.dumps(record, ensure_ascii=False) + "\n"
f.write(line)
# Python with jsonlines library (pip install jsonlines)
import jsonlines
def write_ndjson_jsonlines(records, filepath):
with jsonlines.open(filepath, "w") as writer:
writer.write_all(records) # handles \n and JSON encoding
# ── Express: generate and stream NDJSON from a database ───────────
import express from 'express'
const app = express()
app.get('/export', async (req, res) => {
res.setHeader('Content-Type', 'application/x-ndjson')
res.setHeader('Content-Disposition', 'attachment; filename="export.ndjson"')
const pageSize = 500
let offset = 0
while (true) {
const rows = await db.query(
'SELECT id, name, email FROM users ORDER BY id LIMIT ? OFFSET ?',
[pageSize, offset]
)
if (rows.length === 0) break
for (const row of rows) {
res.write(JSON.stringify(row) + '\n')
}
offset += pageSize
}
res.end()
})
// ── Validate NDJSON before sending ────────────────────────────────
function validateNdjsonLine(line: string): boolean {
if (!line.trim()) return true // blank lines are OK
try {
JSON.parse(line)
return true
} catch {
return false
}
}
// ── NDJSON with metadata header line ──────────────────────────────
// Some APIs send a metadata object on line 1, then records on subsequent lines
const metadataLine = JSON.stringify({ version: '1.0', count: totalRecords, exported: new Date().toISOString() })
res.write(metadataLine + '\n')
for (const record of records) {
res.write(JSON.stringify(record) + '\n')
}When writing NDJSON for downstream consumers, ensure JSON.stringify() is called without a replacer that introduces undefined values — undefined properties are silently dropped, producing valid JSON but potentially missing fields. Use the ensure_ascii=False option in Python's json.dumps() to preserve non-ASCII characters rather than escaping them as \uXXXX sequences. For high-volume exports, batch database queries (500-1000 rows per query) rather than fetching one row at a time — the overhead of individual database round trips dominates for small records. Compress NDJSON exports with gzip (gzip -c data.ndjson > data.ndjson.gz) for storage and transfer; most parsers including Python's ijson support reading from a gzip.open() file object directly.
Key Terms
- NDJSON
- Newline-Delimited JSON — a text format where each line contains exactly one complete JSON value (typically an object), separated by newline characters (
\n). There is no wrapping array or comma between records. NDJSON enables O(1) memory streaming: a parser reads one line, parses it, processes the result, and discards it before reading the next. The MIME type isapplication/x-ndjsonand the file extension is.ndjson. Functionally identical to JSON Lines (.jsonl) — the two names refer to the same format adopted by different communities. - JSON Lines
- A text format (also called JSONL) identical to NDJSON: one JSON value per line, newline-separated, no wrapping structure. The JSON Lines name and
.jsonlextension are preferred in machine learning, data engineering, and scientific computing communities (Hugging Face, Apache Spark, pandasread_json(lines=True)). NDJSON is preferred in web API and streaming contexts. Both formats are parsed identically — any NDJSON parser handles JSON Lines and vice versa. The jsonlines.org spec defines additional conventions like UTF-8 encoding and LF line endings. - streaming parser
- A JSON parser that processes input incrementally, emitting parsed objects or events as it encounters them rather than requiring the full input to be buffered first. Streaming parsers use O(1) or O(record-size) memory regardless of total input size. Examples: Node.js
stream-json(StreamArrayclass), Pythonijson(items()iterator), Javajackson(JsonParserwith streaming API), andoboe.jsfor browser use. Contrast with document parsers (JSON.parse(), Pythonjson.load()) that require the full input string or file in memory. - ReadableStream
- A Web Streams API interface representing a source of streaming data. In the Fetch API,
response.bodyis aReadableStreamof raw bytes. Useresponse.body.pipeThrough(new TextDecoderStream())to decode bytes to text, then.getReader()to pull chunks withreader.read().ReadableStreamis available in all modern browsers, Node.js 18+, Deno, and edge runtimes (Cloudflare Workers, Vercel Edge). Contrast with Node.jsstream.Readable, which is the older Node-specific streaming API not available in edge runtimes. - ijson
- A Python library for incremental JSON parsing — it reads JSON input as a stream of events and exposes a high-level
items(prefix, path)iterator that yields complete Python objects at a given path in the document. Install withpip install ijson; it automatically uses the fastest available backend:yajl2_c(C extension, ~200-400 MB/s),yajl2_cffi(CFFI wrapper), or a pure Python fallback. Theprefixparameter is a dot-separated path:"item"for top-level array elements,"data.users.item"for nested arrays. Files must be opened in binary mode ("rb") for ijson to handle encoding correctly. - backpressure
- A flow control mechanism in streaming systems that slows the producer when the consumer cannot keep up, preventing unbounded memory growth. In Node.js streams,
writable.write()returnsfalsewhen the internal buffer is full — the producer must stop writing and wait for thedrainevent before resuming. In the Web Streams API, backpressure is applied automatically through theReadableStreampull model: the runtime does not callstart()orpull()on aReadableStreamControllerfaster than the consumer reads. Without backpressure, a fast database cursor feeding a slow HTTP client causes the Node.js process to accumulate all records in memory.
FAQ
What is NDJSON (Newline-Delimited JSON)?
NDJSON is a text format where each line contains one complete JSON value — typically an object — separated by \n newline characters. There is no wrapping array, no comma between records, and no document-level structure. Each line is parseable independently, which enables O(1) memory streaming: a parser processes one line, discards it, and reads the next without accumulating the full file. A 1 GB NDJSON file with one million records can be parsed with ~1 MB peak memory. NDJSON is used for log files, data pipeline exports, Elasticsearch Bulk API payloads, and streaming HTTP responses. The file extension is .ndjson and the MIME type is application/x-ndjson. It is functionally identical to JSON Lines (.jsonl) — same format, different name used by different communities.
How do I parse a large JSON file without loading it into memory?
For NDJSON files (one object per line), use Node.js readline.createInterface(({ input: fs.createReadStream("data.ndjson") }) and parse each line with JSON.parse(line). In Python, use ijson: for item in ijson.items(open("data.json", "rb"), "item"): iterates array elements one at a time. For standard JSON arrays too large to parse at once, use the stream-json npm package in Node.js (StreamArray class) or ijson in Python. Never use JSON.parse(fs.readFileSync(...)) or json.load() on files larger than a few hundred MB — these load the entire file into memory. The streaming approach caps peak memory at the size of a single record, regardless of total file size.
How do I stream JSON in Node.js?
For NDJSON files, use the built-in readline module — no npm dependencies: const rl = readline.createInterface({ input: fs.createReadStream("data.ndjson") }); for await (const line of rl) { const obj = JSON.parse(line); }. For standard JSON arrays, install stream-json and use StreamArray, which emits one { key, value } event per array element. For HTTP streaming responses, read response.body as a ReadableStream, pipe through TextDecoderStream, split chunks on \n, and parse complete lines. If processRecord is async (database write, HTTP call), await it inside the loop — this applies natural backpressure and prevents unprocessed records from queuing in memory. For parallel processing, distribute lines to a node:worker_threads worker pool.
How do I parse a large JSON array in Python?
Use the ijson library (pip install ijson). Open the file in binary mode and pass the prefix for your array elements: for item in ijson.items(open("large.json", "rb"), "item"): processes each element of a top-level array. For a nested array at data.users, use prefix "data.users.item". ijson uses a C extension (yajl2) by default for ~200-400 MB/s throughput. For NDJSON files, the standard library is sufficient: iterate lines with for line in f: and call json.loads(line.strip()) on each. Never use json.load(f) for files larger than available RAM — it reads the entire file into memory before returning any data.
How do I stream JSON responses with the Fetch API?
Read response.body directly instead of calling response.json(). Pipe it through TextDecoderStream and use getReader(): const reader = response.body.pipeThrough(new TextDecoderStream()).getReader(). In a loop, call reader.read() to get chunks, buffer them, split on \n, and parse complete lines with JSON.parse(). Keep any incomplete trailing content in a buffer for the next iteration. response.json() buffers the entire body before parsing — for NDJSON streams, it defeats the streaming advantage entirely. The ReadableStream approach starts processing the first records at the latency of the first network packet, not after the full response completes.
What is the difference between JSON Lines and NDJSON?
JSON Lines and NDJSON are the same format with different names — both store one JSON value per line separated by newline characters, with no wrapping structure or comma between records. JSON Lines (extension .jsonl, site jsonlines.org) is the name used in machine learning and data engineering communities: pandas, Apache Spark, and Hugging Face datasets all use .jsonl. NDJSON (extension .ndjson, MIME type application/x-ndjson) is the name used in HTTP streaming APIs and real-time data contexts. Any parser, tool, or library that supports one format automatically supports the other — there is no technical distinction. Choose the name and extension based on your ecosystem and downstream tool expectations.
How do I use jq to process large JSON files?
For NDJSON input, jq processes each line independently by default — no special flags needed: jq '.name' data.ndjson extracts the name field from every record. Use -c (compact) for large outputs to avoid per-record pretty-printing overhead. For filtering: jq 'select(.score > 90)' data.ndjson. For standard JSON arrays that are too large to load, use --stream: jq --stream -n 'fromstream(1|truncate_stream(inputs; 1))' large.json processes each array element without loading the full file. Convert a JSON array to NDJSON with jq -c '.[]' input.json. For aggregations (sum, average, group-by), jq must load the full file — use awk or Python for those operations on very large files.
How do I write streaming JSON responses in Express?
Use res.write() to send chunks incrementally instead of res.json(), which buffers the full response. Set the content type and transfer encoding: res.setHeader('Content-Type', 'application/x-ndjson') and res.setHeader('Transfer-Encoding', 'chunked'). Stream from a database cursor: for await (const doc of cursor) { const ok = res.write(JSON.stringify(doc) + "\n"); if (!ok) await new Promise(r => res.once("drain", r)); }. Check the return value of res.write() and await the drain event when it returns false — this implements backpressure and prevents memory exhaustion when the client is slower than the database. Disable gzip compression middleware for streaming routes — gzip buffers the full response before sending, defeating streaming.
Further reading and primary sources
- NDJSON Specification — Official NDJSON format specification and grammar definition
- Node.js readline Documentation — Node.js built-in readline module for line-by-line stream reading
- Python ijson on PyPI — ijson library documentation, backends, and prefix syntax for incremental JSON parsing
- WHATWG Streams API: ReadableStream — Web Streams specification for ReadableStream, backpressure, and stream pipelines
- jq Manual: --stream Flag — jq streaming mode documentation with fromstream and truncate_stream examples