JSON vs CSV vs XML: Which Format Should You Use?

Last updated:

JSON, CSV, and XML are the three most common data interchange formats — but they solve different problems. JSON excels at hierarchical API data, CSV at flat tabular exports, and XML at document-centric and enterprise integration workloads. This guide compares them across 10 dimensions with a practical decision guide.

1. The Same Data in Three Formats

One record — three representations:

// JSON
[
  { "id": 1, "name": "Alice", "role": "admin", "active": true },
  { "id": 2, "name": "Bob",   "role": "user",  "active": false }
]

// CSV
id,name,role,active
1,Alice,admin,true
2,Bob,user,false

// XML
<users>
  <user id="1">
    <name>Alice</name>
    <role>admin</role>
    <active>true</active>
  </user>
  <user id="2">
    <name>Bob</name>
    <role>user</role>
    <active>false</active>
  </user>
</users>

Approximate byte counts: JSON ~104 bytes, CSV ~48 bytes, XML ~190 bytes. The CSV advantage disappears the moment the data has nesting or variable-length arrays.

2. Structure and Data Types

FeatureJSONCSVXML
Nested objects✅ Native❌ Must flatten or encode✅ Native
Arrays✅ Native❌ Multi-row only (flat)⚠️ Repeated elements
Boolean type✅ true/false❌ String "true"/"false"❌ String in text
Null type✅ null❌ Empty string or absent❌ xsi:nil attribute
Number type✅ Native❌ String (loses precision)❌ String in text
Mixed content❌ No❌ No✅ Text + child elements
Attributes❌ No❌ No✅ On elements
Comments❌ No❌ No<!-- -->

3. File Size Comparison

For flat tabular data (1000 user rows, 5 columns):

Format      Raw     gzipped   In-memory (parsed)
CSV         ~45 KB  ~8 KB     ~1–2 MB (strings only)
JSON        ~75 KB  ~10 KB    ~4–6 MB (objects + keys)
XML         ~130 KB ~12 KB    ~15–25 MB (DOM tree)

For nested data (1000 orders, each with 3 line items):
CSV         ~90 KB  ~15 KB    (requires multi-file or flattening)
JSON        ~120 KB ~18 KB    ~8–12 MB
XML         ~250 KB ~22 KB    ~40+ MB

gzip compression significantly narrows the wire-size gap. The in-memory size is where XML's DOM overhead becomes most painful for large documents.

4. Schema and Validation

Schema SystemJSONCSVXML
Official schema languageJSON Schema (draft 2020-12)CSV Schema (CSVW W3C)XSD (W3C standard)
Ecosystem adoptionVery high (OpenAPI, etc.)LowHigh (enterprise, SOAP)
Inline typing✅ via JSON Schema❌ None in file✅ via XSD annotations
Cross-field constraints⚠️ Limited (if/then)❌ No✅ XSD key/keyref

5. Tooling and Ecosystem

JSON tooling has exploded since 2015:

  • JSON: JSON Schema, OpenAPI/Swagger, Postman, jq, JSONPath, VS Code IntelliSense, Zod, Ajv, TypeScript auto-generation
  • CSV: Excel, Google Sheets, pandas, DuckDB, PostgreSQL COPY, csvkit, PartiQL
  • XML: XPath, XQuery, XSLT, Saxon, Oxygen XML Editor, WSDL/SOAP, Maven, Android Studio

6. When Each Format Wins

Use CaseBest FormatWhy
REST / GraphQL API responsesJSONNative to JS, HTTP-native, OpenAPI support
Database export / spreadsheet importCSVUniversal, compact, Excel-native
SOAP web servicesXMLRequired by WSDL/SOAP spec
Config filesJSON / YAML / TOMLJSON for simplicity, YAML for comments
ML / analytics datasets (millions of rows)CSV or ParquetCompact, column-oriented for OLAP
Document format (Office, SVG, RSS)XMLMixed content, namespace, spec requirement
Event streaming / logsJSONL / NDJSONOne JSON object per line, streamable
Human-edited data filesJSON or YAMLComments (YAML), familiar syntax (JSON)

7. Converting Between Formats

# JSON array → CSV (Python)
import json, csv, io

data = json.loads(open("data.json").read())
out = io.StringIO()
writer = csv.DictWriter(out, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)

# CSV → JSON (Python)
import csv, json
with open("data.csv") as f:
    rows = list(csv.DictReader(f))
print(json.dumps(rows, indent=2))

# JSON → XML (Python with xmltodict)
pip install xmltodict
import json, xmltodict
data = {"root": json.loads(open("data.json").read())}
xml = xmltodict.unparse(data, pretty=True)

# XML → JSON (Python with xmltodict)
import xmltodict, json
data = xmltodict.parse(open("data.xml").read())
print(json.dumps(data, indent=2))

For browser-based conversion, Jsonic provides JSON to CSV, CSV to JSON, JSON to XML, and XML to JSON tools — no installation required.

Frequently Asked Questions

When should I use JSON instead of CSV or XML?

Use JSON when your data is hierarchical (nested objects, arrays of objects with mixed types), when you are building a web API or consuming one, when you need native support in JavaScript/TypeScript without a parsing library, or when your data has optional and nullable fields that vary per record. JSON is also the right choice when you need a schema system (JSON Schema), when human readability matters for debugging, or when your toolchain is built around it (Node.js, Python requests, fetch API). JSON is the worst choice for flat tabular data with millions of rows — CSV is 3-5x more compact in that scenario because JSON repeats key names on every object.

When should I use CSV instead of JSON or XML?

Use CSV when your data is genuinely tabular — same columns on every row, no nesting, no mixed types within a column. CSV is the right choice for data imports/exports from spreadsheets and databases (Excel, Google Sheets, PostgreSQL COPY, MySQL LOAD DATA), for large flat datasets where file size matters (logs, analytics exports, machine learning datasets), and for interoperability with non-developer audiences who will open the file in a spreadsheet application. CSV is the wrong choice when rows have different shapes (sparse data), when values contain commas or newlines (escaping rules vary between dialects), when you need to express null vs empty string unambiguously, or when you need nested or repeated structure. CSV has no standard schema format, no type system beyond strings, and no official spec for quoting (RFC 4180 is a de facto standard but not universally implemented).

When should I use XML instead of JSON or CSV?

Use XML when the ecosystem you are integrating with requires it — SOAP web services, RSS/Atom feeds, SVG images, Android layouts, Maven/Gradle build files, Microsoft Office documents (DOCX, XLSX are ZIP files containing XML), SAML authentication, and most legacy enterprise integration (EDI, banking, healthcare HL7). XML has unique capabilities that JSON lacks: mixed content (elements containing both text and child elements simultaneously, like HTML), processing instructions, namespaces for multi-vocabulary documents, and XSD schemas with rich validation including cross-field constraints. XML also has mature query tools (XPath, XQuery, XSLT) for transformations that predate jq by two decades. For new projects with no legacy requirements, JSON is almost always simpler. XML's verbosity (opening and closing tags double the content) is a real cost in bandwidth and storage.

How much larger is XML than JSON for the same data?

XML is typically 2-4x larger than JSON for the same data, primarily because of repeated element names in closing tags. A JSON field "{"name": "Alice"}" is 16 bytes. The XML equivalent "<name>Alice</name>" is 22 bytes for a short key — and grows proportionally with key length. For an array of 1000 user objects with 10 fields each, JSON with key repetition is still 20-30% smaller than equivalent XML. Both JSON and XML shrink dramatically with gzip compression (70-85% reduction) because both have high repetition — so over HTTP/2 with compression, the wire size difference narrows to 5-15%. Where the difference matters most is in-memory parsing (XML DOM is 5-10x the raw file size), CPU time (XML parsers are slower), and storage costs for cold data that is not compressed.

Can CSV represent nested data like JSON can?

Not natively. CSV is strictly 2-dimensional: rows and columns. To represent a nested structure in CSV you must flatten it — either by encoding the nested value as a serialized string in a column (e.g., storing the entire JSON of an address sub-object in a single cell), by adding dotted columns (user.name, user.email, user.address.city), or by splitting into multiple related CSV files with ID joins. Each approach has downsides: string-encoded JSON in CSV cells is unreadable and breaks CSV tools; dotted columns fail when the nested structure has variable depth or variable-length arrays; multi-file CSV is essentially a hand-rolled relational database. If your data is genuinely nested, use JSON or a database. If you need CSV for a particular tool and your data is nested, use pandas json_normalize() in Python or the flat npm package in JavaScript to flatten before export.

Which format is best for REST API responses?

JSON is the de facto standard for REST API responses and has been since roughly 2013, when it overtook XML (which was dominant in SOAP/REST APIs before that). JSON advantages for APIs: native to JavaScript (the primary web client language), compact compared to XML, human-readable compared to binary formats, supported natively by fetch(), axios, and every HTTP library, and toolable with Swagger/OpenAPI for documentation and schema validation. The only scenarios where you might serve something other than JSON from a REST API: binary data (use multipart/form-data or application/octet-stream), CSV downloads (use text/csv with Content-Disposition: attachment for spreadsheet exports), or legacy SOAP clients that require XML. Even then, the API itself is JSON — you add a separate endpoint or query parameter for CSV/XML exports. GraphQL APIs use JSON. gRPC uses protobuf (binary), not JSON, but provides JSON transcoding via the gRPC-JSON Transcoder for browser clients.

Which format parses fastest: JSON, CSV, or XML?

For typical data sizes, CSV is fastest to parse because its grammar is simpler (scan for delimiter and newline characters) and values are untyped (everything is a string). JSON parsing is close behind because modern engines (V8, PyPy, Rust serde) have highly optimized JSON parsers. XML parsing is significantly slower because it must handle namespaces, DTDs, entities (&amp; expansions), and mixed content; a full DOM parse also builds a tree structure that is 5-10x the file size in memory. Benchmark numbers for a 10 MB file: CSV parse in Python with pandas ~50ms, JSON with orjson ~80ms, XML with lxml ~300ms. These ratios hold roughly across languages. For truly high throughput (streaming 1 GB+ files), all three formats can be parsed as streams to avoid memory overhead — jsonl/ndjson (one JSON object per line) is the most popular streaming format for JSON, making it competitive with CSV in streaming throughput.

How do I convert between JSON, CSV, and XML?

Conversion is possible but lossy — each format has constructs the others cannot represent. JSON to CSV: use pandas.json_normalize() in Python or flatten-js in JavaScript to flatten nested JSON, then write to CSV. Array-of-objects JSON converts cleanly; hierarchical JSON requires design decisions about flattening. JSON to XML: use xmltodict in Python (json.dumps → xmltodict.unparse) or a JSON-to-XML library. Keys become element names (keys cannot start with numbers in XML, so numbers must be prefixed). CSV to JSON: use Python csv.DictReader or PapaParse in JavaScript — each row becomes a JSON object with string values (no type inference unless you add a step). XML to JSON: xmltodict (Python) or fast-xml-parser (JavaScript) — attributes and text content need special handling. For online conversion, Jsonic provides JSON-to-CSV, CSV-to-JSON, JSON-to-XML, and XML-to-JSON tools.

Further reading and primary sources