JSONL Format (JSON Lines): Read, Write, and Stream Line-Delimited JSON

Q: What is JSONL format?

JSONL (JSON Lines) is a text file format where each line contains one complete, valid JSON value — typically a JSON object — separated by newline characters. The file extension is .jsonl. There are no wrapper brackets, commas between records, or any other container structure: each line is fully self-contained and can be parsed with JSON.parse() (JavaScript) or json.loads() (Python) independently. This makes JSONL ideal for streaming and large-scale processing: you can read and process records one line at a time without loading the entire file. JSONL is also known as NDJSON (Newline Delimited JSON), JSON Lines, and LDJSON — the terms are interchangeable. It is the standard format for machine learning datasets (OpenAI fine-tuning, Hugging Face), log aggregation (Fluentd, Vector, Logstash), analytics event streams, and large data exports.

Q: What is the difference between JSONL and NDJSON?

JSONL and NDJSON are different names for the same format. Both store one valid JSON value per line, separated by newlines. The jsonlines.org specification calls it "JSON Lines" with the .jsonl extension; the ndjson.org specification calls it "Newline Delimited JSON" with the .ndjson extension. LDJSON (Line Delimited JSON) is a third name for the same concept. There are no meaningful technical differences: files produced under one name are fully compatible with parsers expecting the other. In practice, .jsonl is more common in data science and ML communities (OpenAI, Hugging Face, BigQuery documentation), while .ndjson is more common in logging and streaming contexts (Elasticsearch, Logstash). Choose whichever extension your tooling expects; the file contents are identical.

Q: How do I read a JSONL file in Python?

The simplest approach uses only the Python standard library: open the file, iterate over lines, strip whitespace, skip empty lines, and call json.loads() on each line. Example: import json; records = [json.loads(line) for line in open("data.jsonl") if line.strip()]. This handles files of any size because it reads one line at a time without loading the full file. For production code, wrap json.loads() in a try/except to catch json.JSONDecodeError on malformed lines and log them without stopping processing. The jsonlines library (pip install jsonlines) provides a cleaner API with jsonlines.open("data.jsonl") and handles edge cases automatically. For very large files (>1 GB), use the standard library approach — it avoids an extra dependency and reads efficiently.

Q: How do I convert a JSON array to JSONL?

The fastest one-liner uses jq: jq -c '.[] ' input.json > output.jsonl. The -c flag outputs compact (single-line) JSON; .[] iterates over array elements. In Python: import json; [open("out.jsonl","w").write(json.dumps(r)+"\n") for r in json.load(open("in.json"))]. In JavaScript: const arr = JSON.parse(fs.readFileSync("in.json","utf8")); fs.writeFileSync("out.jsonl", arr.map(r => JSON.stringify(r)).join("\n")+"\n"). To convert JSONL back to a JSON array, use jq with slurp mode: jq -s "." input.jsonl > output.json. The -s flag reads all lines and collects them into an array. Note that converting large files to a JSON array requires loading all records into memory simultaneously, which can fail for files larger than your available RAM.

Q: What tools support JSONL?

JSONL has broad ecosystem support. Command-line: jq processes JSONL natively (one object per line); grep, awk, and wc -l work on JSONL without special modes. Databases: BigQuery accepts JSONL for streaming inserts; Elasticsearch uses NDJSON for its Bulk API. ML platforms: OpenAI fine-tuning API requires JSONL; Hugging Face datasets use JSONL for load_dataset("json", data_files="..."). Logging: Fluentd, Logstash, Vector, and Fluent Bit all output and consume NDJSON by default. Data processing: Apache Spark reads JSONL with spark.read.json("path/") automatically. Python libraries: jsonlines, pandas (pd.read_json("file.jsonl", lines=True)), polars (pl.read_ndjson("file.jsonl")). Node.js: readline built-in module, or ndjson npm package.

JSONL (JSON Lines) is a text format where each line contains exactly one complete, valid JSON value — most commonly a JSON object — separated by newline characters (\n). Files use the .jsonl extension. Unlike a JSON array, JSONL has no wrapper brackets or commas between records, so each line can be parsed independently with JSON.parse() (JavaScript) or json.loads() (Python). Processing a 10 GB JSONL file requires constant memory — one line at a time — versus loading the entire document for a JSON array. JSONL is the standard format for OpenAI fine-tuning datasets, Hugging Face datasets, Elasticsearch Bulk API imports, and BigQuery streaming inserts. It is also called NDJSON (Newline Delimited JSON) and JSON Lines; the .jsonl and .ndjson extensions are interchangeable. Before processing JSONL, use Jsonic's JSON Formatter to inspect individual records.

Need to inspect or validate individual JSONL records? Jsonic's JSON Formatter formats any JSON value instantly.

Open JSON Formatter

JSONL file structure

A JSONL file is plain text. Each line is a self-contained, fully valid JSON value — most commonly a JSON object. There is no opening [, no , between lines, and no closing ]. This absence of a container structure is what makes JSONL streamable: a parser can start producing output after reading the first newline without waiting for the rest of the file.

Empty lines are allowed and should be skipped by parsers. Each non-empty line must be valid JSON when parsed in isolation — a line that contains half of an object is not valid JSONL. The last line may or may not end with a trailing newline; both variants are valid per the JSON Lines specification. Here is a minimal example with three product records:

{"id": 1, "name": "Widget A", "price": 9.99, "category": "tools"}
{"id": 2, "name": "Widget B", "price": 24.99, "category": "electronics"}
{"id": 3, "name": "Widget C", "price": 4.49, "category": "tools"}

Each line is a valid JSON object you could paste directly into Jsonic's JSON Formatter and get it parsed. Because lines are independent, a single corrupt line does not invalidate the rest of the file — a major advantage over a JSON array, where one syntax error breaks the entire parse. This partial-recovery property makes JSONL well-suited for log pipelines where occasional bad records are expected.

JSONL vs JSON array

The right choice depends on file size, write patterns, and tooling requirements. The table below covers every material difference. As a rule of thumb: use JSONL for data pipelines, machine learning datasets, and log streams; use a JSON array for API responses, configuration, and any dataset small enough to fit in memory.

	JSONL	JSON Array
File size overhead	Minimal (newlines only)	Brackets + commas
Stream-friendly	Yes (one line at a time)	No (need full file)
Memory usage	O(1) — constant	O(n) — full file
Append new record	Add a line	Edit closing bracket
Partial read	Yes	No
Crash-safe append	Yes (atomic line write)	No
Tool support	grep, awk, wc -l	jq (full file)
Human readability	Easy per-line	Need full context

Use JSONL when: records are appended incrementally (log files, event streams), the dataset exceeds 100 MB, or an API requires it (OpenAI fine-tuning, Elasticsearch Bulk). Use a JSON array when: the dataset is small, you need to share it with a tool that reads standard JSON, or the ordered-list semantics matter. For a deeper comparison of the two formats, see the JSON vs NDJSON comparison.

Read a JSONL file in Python

The standard library approach requires no dependencies and works for files of any size: open the file, iterate over lines, strip whitespace, skip empty lines, and pass each line to json.loads(). The jsonlines library provides a cleaner API and handles edge cases (BOM, trailing whitespace) automatically. Choose the approach that fits your environment; the performance difference is negligible for most workloads.

import json

# Plain Python — no library needed
with open('data.jsonl', 'r') as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        record = json.loads(line)
        print(record)

# With jsonlines library (pip install jsonlines)
import jsonlines
with jsonlines.open('data.jsonl') as reader:
    for record in reader:
        print(record)

json.loads() raises json.JSONDecodeError if a line is not valid JSON. In production pipelines, wrap the call in a try/except json.JSONDecodeError block and log bad lines rather than crashing the entire job. Unlike a corrupt JSON array, a bad line in JSONL only affects that one record — all other lines remain parseable. For more on parse JSON in Python, including nested objects and type conversion, see the dedicated guide.

Write a JSONL file in Python

Writing JSONL is straightforward: serialize each record with json.dumps() and write it followed by a newline. Open the file in 'w' mode to create or overwrite, or 'a' mode to append new records without modifying existing lines — a safe, atomic operation that does not require reading the file first.

import json

records = [
    {"id": 1, "name": "Alice", "score": 95},
    {"id": 2, "name": "Bob", "score": 87},
    {"id": 3, "name": "Carol", "score": 92},
]

with open('output.jsonl', 'w') as f:
    for record in records:
        f.write(json.dumps(record) + '\n')

For compact output, pass separators=(',', ':') to json.dumps() — this removes spaces after colons and commas, reducing file size by 10–20% for typical records. For non-ASCII characters (Chinese, Arabic, emoji), add ensure_ascii=False to write the characters directly rather than escape sequences like 中. To append a single record later, open with 'a' mode: open('output.jsonl', 'a'). For more on reading JSON files in Python, including error handling and large-file strategies, see the dedicated guide.

Read a JSONL file in JavaScript / Node.js

For large JSONL files, use Node.js's built-in readline module to stream one line at a time. This avoids loading the full file into memory and processes each record as soon as it arrives from disk. The for await...of syntax makes the streaming loop as readable as a synchronous iterator.

import { createReadStream } from 'fs'
import { createInterface } from 'readline'

const rl = createInterface({
  input: createReadStream('data.jsonl'),
  crlfDelay: Infinity,
})

for await (const line of rl) {
  if (!line.trim()) continue
  const record = JSON.parse(line)
  console.log(record)
}

For small files (<10 MB), the one-liner is sufficient: fs.readFileSync('data.jsonl', 'utf8').split('\n').filter(Boolean).map(JSON.parse). Avoid this approach for large files — it reads the entire file into a string, splits it into an array, then calls JSON.parse on each element, temporarily holding all parsed objects in memory at once. The readline approach processes one line at a time, keeping memory usage constant regardless of file size. Set crlfDelay: Infinity to handle Windows line endings (\r\n) correctly.

Write a JSONL file in JavaScript

Use a writable stream from fs.createWriteStream to write JSONL efficiently. Each stream.write() call is buffered internally, so you don't need to build the full string in memory before writing. Call stream.end() when done to flush the buffer and close the file.

import { createWriteStream } from 'fs'

const records = [
  { id: 1, name: 'Alice' },
  { id: 2, name: 'Bob' },
]

const stream = createWriteStream('output.jsonl')
for (const record of records) {
  stream.write(JSON.stringify(record) + '\n')
}
stream.end()

To append records to an existing file without overwriting it, pass the flags option: createWriteStream('output.jsonl', { flags: 'a' }). Each line write is atomic on POSIX filesystems — if the process crashes mid-write, at most one line is corrupt, and all previously written lines remain intact. This crash-safety property makes JSONL append-mode ideal for logging and event collection where process restarts are common.

JSONL in machine learning and data pipelines

JSONL is the default serialization format for the most widely used ML and data platforms. Understanding why each platform chose it helps you integrate correctly.

OpenAI fine-tuning API requires a .jsonl file where each line is a conversation in the chat format:

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is JSON?"}, {"role": "assistant", "content": "JSON is a text-based data format..."}]}

Hugging Face datasets store training splits as .jsonl files and load them with load_dataset('json', data_files='train.jsonl'). The streaming mode (streaming=True) reads one line at a time directly from disk or S3, making 100 GB+ datasets feasible on a laptop.

Elasticsearch Bulk API uses NDJSON with alternating action and document lines: the first line specifies the index operation ({"index": {"_index": "products"}}) and the second line is the document to index. This paired-line pattern lets Elasticsearch process millions of documents per second in a single HTTP request.

BigQuery accepts newline-delimited JSON for streaming inserts via the Storage Write API. Apache Spark reads JSONL natively with spark.read.json('path/') — each line becomes one DataFrame row. For jq filter examples useful when inspecting JSONL output from these pipelines, see the dedicated guide.

Process JSONL with command-line tools

Because JSONL is line-oriented text, standard Unix tools work on it without special modes. This is one of the main practical advantages over a JSON array.

# Extract a field from every record
jq '.name' data.jsonl

# Count records matching a condition
grep '"category":"electronics"' data.jsonl | wc -l

# Count total records (no parsing needed)
wc -l data.jsonl

# Filter records where score > 90
grep -v '^$' data.jsonl | jq 'select(.score > 90)'

# Convert JSONL to a JSON array (slurp mode)
jq -s '.' data.jsonl > data.json

# Convert JSON array to JSONL (compact, one object per line)
jq -c '.[]' data.json > data.jsonl

jq processes JSONL natively — when its input contains multiple JSON values separated by newlines, it applies the filter to each one in sequence. The -c flag (compact output) ensures each result is on a single line, which is required for valid JSONL output. The -s flag (slurp) collects all input records into a single array before applying the filter, which is how you convert JSONL back to a JSON array. wc -l gives you the record count instantly without parsing — on a 1 GB file this completes in milliseconds.

Validate JSONL

Validating a JSONL file means verifying that every non-empty line is valid JSON. Unlike a JSON array, a single invalid line does not prevent parsing the rest — you can report which lines are malformed and still process the valid ones.

# Python one-liner: print line numbers of invalid records
python3 -c "
import json, sys
errors = 0
for i, line in enumerate(open(sys.argv[1]), 1):
    line = line.strip()
    if not line:
        continue
    try:
        json.loads(line)
    except json.JSONDecodeError as e:
        print(f'Line {i}: {e}')
        errors += 1
print(f'{errors} error(s) found')
" data.jsonl

# Shell one-liner: validate silently (exit 1 if any error)
python3 -c "import json,sys; [json.loads(l) for l in open(sys.argv[1]) if l.strip()]" data.jsonl

The partial-recovery property is a key operational advantage: in a 10 million-line JSONL file with 3 corrupt lines, you can recover 9,999,997 records. In an equivalent JSON array, a single syntax error — a missing comma, an extra bracket — can make the entire file unparseable. For validating the schema of each record (not just JSON syntax), apply a JSON Schema validator to each parsed object in the loop above.

Frequently asked questions

What is JSONL format?

JSONL (JSON Lines) is a text file format where each line contains one complete, valid JSON value — typically a JSON object — separated by newline characters. The file extension is .jsonl. There are no wrapper brackets, commas between records, or any other container structure around the individual JSON values: each line is fully self-contained and can be parsed with JSON.parse() (JavaScript) or json.loads() (Python) independently. This property makes JSONL ideal for streaming and large-scale processing: you can read and process records one line at a time without loading the entire file. JSONL is also known as NDJSON (Newline Delimited JSON), JSON Lines, and LDJSON — the terms are interchangeable and the formats are compatible. It is the standard format for machine learning datasets (OpenAI fine-tuning, Hugging Face), log aggregation (Fluentd, Vector, Logstash), analytics event streams, and large data exports.

What is the difference between JSONL and NDJSON?

JSONL and NDJSON are different names for the same format. Both store one valid JSON value per line, separated by newlines. The jsonlines.org specification calls it "JSON Lines" with the .jsonl extension; the ndjson.org specification calls it "Newline Delimited JSON" with the .ndjson extension. LDJSON (Line Delimited JSON) is a third name for the same concept. There are no meaningful technical differences: files produced under one name are fully compatible with parsers expecting the other. In practice, .jsonl is more common in data science and ML communities (where it appears in OpenAI, Hugging Face, and BigQuery documentation), while .ndjson is more common in logging and streaming contexts (Elasticsearch, Logstash). Choose whichever extension your tooling expects; the file contents are identical. See the full JSON vs NDJSON comparison for a side-by-side breakdown.

How do I read a JSONL file in Python?

The simplest approach uses only the Python standard library: open the file, iterate over lines, strip whitespace, skip empty lines, and call json.loads() on each line. Example: import json; records = [json.loads(line) for line in open('data.jsonl') if line.strip()]. This handles files of any size because it reads one line at a time without loading the full file. For production code, add error handling: wrap json.loads() in a try/except to catch json.JSONDecodeError on malformed lines and log them without stopping processing. The jsonlines library (pip install jsonlines) provides a cleaner API with jsonlines.open('data.jsonl') and handles edge cases automatically. For very large files (> 1 GB), the standard library approach avoids an extra dependency. More detail in the guide on parse JSON in Python.

How do I convert a JSON array to JSONL?

The fastest one-liner uses jq: jq -c '.[]' input.json > output.jsonl. The -c flag outputs compact (single-line) JSON; .[] iterates over array elements. In Python: import json; [open('out.jsonl','a').write(json.dumps(r)+'\n') for r in json.load(open('in.json'))]. In JavaScript: const arr = JSON.parse(fs.readFileSync('in.json','utf8')); fs.writeFileSync('out.jsonl', arr.map(r => JSON.stringify(r)).join('\n')+'\n'). To convert JSONL back to a JSON array, use jq with slurp mode: jq -s '.' input.jsonl > output.json. The -s flag reads all lines and collects them into an array. Note that converting large files to a JSON array requires loading all records into memory simultaneously, which can fail for files larger than your available RAM.

What tools support JSONL?

JSONL has broad ecosystem support. Command-line: jq processes JSONL natively; grep, awk, and wc -l work without special modes. Databases: BigQuery accepts JSONL for streaming inserts; Elasticsearch uses NDJSON for its Bulk API. ML platforms: OpenAI fine-tuning API requires JSONL; Hugging Face datasets use JSONL for load_dataset('json', data_files='...'). Logging: Fluentd, Logstash, Vector, and Fluent Bit all output and consume NDJSON by default. Data processing: Apache Spark reads JSONL with spark.read.json('path/') automatically. Python libraries: jsonlines, pandas (pd.read_json('file.jsonl', lines=True)), polars (pl.read_ndjson('file.jsonl')). Node.js: readline built-in module, or the ndjson npm package. For command-line examples, see jq filter examples.

When should I use JSONL instead of a JSON array?

Use JSONL when: (1) the dataset is large enough that loading the full file into memory is impractical (rule of thumb: > 100 MB); (2) records are appended incrementally and you want crash-safe writes — each line write is atomic and no bracket editing is needed; (3) you need to process records in parallel or in a streaming pipeline, where each worker takes one line; (4) you want to use command-line tools like grep, awk, or wc -l directly on the file; (5) an external API requires it (OpenAI, Elasticsearch, BigQuery). Use a JSON array when: the dataset is small (< 10 MB), you need to share data in a format that any JSON parser can read without special handling, or the array structure itself is semantically meaningful (e.g., an ordered list of results). For configuration files, always use a JSON array or object — JSONL is not intended for human-edited config. For more on reading JSON files in Python and handling both formats, see the dedicated guide.

Ready to work with JSONL?

Use Jsonic's JSON Formatter to inspect individual records from a JSONL file, or explore our guides on jq filter examples for processing JSONL on the command line.

Open JSON Formatter