Parse JSON in R: jsonlite, RJSONIO, rjson, and Working with Nested JSON

Last updated:

R has four serious JSON packages on CRAN — jsonlite, RJSONIO, rjson, and the newer jsonify — and the choice shapes how much glue code you write. jsonlite is the default for analytics work because it auto-coerces JSON arrays into vectors and arrays-of-objects into data.frames, which is almost always what an R user wants. The other packages stay closer to the wire format, which is occasionally what you need. This guide covers the package landscape, the fromJSON/toJSON core, the simplification flags that change behavior the most, flattening nested structures into tidy tables, NDJSON streaming, API calls with httr2, and the NA-vs-null gotcha.

Got a malformed JSON file that fromJSON refuses to read? Paste it into Jsonic's JSON Validator— it pinpoints the line and column of the parse error, which is much faster than reading R's generic "unexpected character" message.

Validate JSON before R parses it

R JSON library landscape: jsonlite, RJSONIO, rjson

Four packages on CRAN cover the JSON problem in R, each with a different design philosophy. Knowing which one a codebase uses is the first step when inheriting existing R code that touches JSON.

jsonlite is the current default. Maintained by Jeroen Ooms (also behind curl, magick, and a long list of CRAN infrastructure), it is what nearly every other JSON-touching package imports today. The defining feature is automatic simplification: JSON arrays of scalars become R atomic vectors, JSON arrays of objects become data.frames, and nested numeric arrays can become matrices. The coercion is opt-out (each flag defaults to TRUE) and matches what an R analyst wants 90% of the time.

RJSONIO predates jsonlite and was the first widely-used R JSON package. Still on CRAN and still maintained, but the API is lower level — you get back lists of lists and convert to vectors or data frames yourself. New code rarely picks it; you encounter it in older codebases.

rjson is the simplest — a thin pure-C binding. It parses fast on small payloads but has no streaming, no auto-simplification, and a minimal API. Useful when you want a tiny dependency and full control of the shape, but most analysts hit its limits quickly.

jsonify is newer, written in C++ via Rcpp, and benchmarks faster than jsonlite for many round-trip workloads. The API mirrors jsonlite closely (from_json/to_json), which makes it a candidate when JSON parsing dominates your runtime profile. Smaller community than jsonlite, so reach for it after profiling confirms it would help.

fromJSON and toJSON basics

The two functions you will use most are jsonlite::fromJSON() for reading and jsonlite::toJSON()for writing. Both auto-detect the input's nature: fromJSON figures out whether its argument is a file path, a URL, a connection, or a raw string and reads accordingly.

# Install once
install.packages("jsonlite")

# Load every session
library(jsonlite)

# Three input styles — fromJSON handles all of them
from_file   <- fromJSON("data/users.json")
from_string <- fromJSON('{"id": 1, "name": "Ada"}')
from_url    <- fromJSON("https://api.example.com/users")

# Inspect
str(from_file)
# 'data.frame': 1500 obs. of 4 variables:
#  $ id   : int  1 2 3 ...
#  $ name : chr  "Ada" "Bea" "Cyn" ...
#  $ tags : List of 1500
#  $ meta :'data.frame': 1500 obs. of 2 variables

Writing JSON mirrors reading. toJSON returns a character string with the serialized payload; write_json sends it to a file. Two flags change the output more than any others.

# toJSON returns a string
df <- data.frame(id = 1:3, name = c("Ada", "Bea", "Cyn"))

# Default — compact, every scalar wrapped in an array because R has no scalars
toJSON(df)
# [{"id":[1],"name":["Ada"]},{"id":[2],"name":["Bea"]},{"id":[3],"name":["Cyn"]}]

# With auto_unbox: length-1 vectors emit as JSON scalars
toJSON(df, auto_unbox = TRUE)
# [{"id":1,"name":"Ada"},{"id":2,"name":"Bea"},{"id":3,"name":"Cyn"}]

# With pretty: indented, newline-separated
toJSON(df, auto_unbox = TRUE, pretty = TRUE)
# [
#   { "id": 1, "name": "Ada" },
#   ...
# ]

# write_json sends the same output to disk
write_json(df, "users.json", auto_unbox = TRUE, pretty = TRUE)

The auto_unbox = TRUE flag is what API consumers expect — without it, every length-1 R vector serializes as a one-element JSON array, which JavaScript, Python, and Go receivers all find surprising. Set it once and forget.

Auto-flattening: simplifyVector, simplifyDataFrame, simplifyMatrix

The three simplify* flags are the most consequential settings in jsonlite. They control whether the package collapses JSON structures into native R types or leaves them as raw lists. All three default to TRUE; turning any off gives you the wire-shape verbatim.

FlagJSON inputWith flag on (default)With flag off
simplifyVector[1, 2, 3]c(1, 2, 3) (integer vector)list(1, 2, 3)
simplifyDataFrame[{a:1}, {a:2}]data.frame(a = c(1, 2))list(list(a=1), list(a=2))
simplifyMatrix[[1,2],[3,4]]matrix(1:4, 2, 2)list(c(1,2), c(3,4))
# Default — auto-simplified
fromJSON('[1, 2, 3]')
#> [1] 1 2 3
class(fromJSON('[1, 2, 3]'))
#> [1] "integer"

# With simplifyVector off — list back
fromJSON('[1, 2, 3]', simplifyVector = FALSE)
#> [[1]]
#> [1] 1
#> [[2]]
#> [1] 2
#> [[3]]
#> [1] 3

# Array of objects → data.frame
fromJSON('[{"id":1,"name":"Ada"},{"id":2,"name":"Bea"}]')
#>   id name
#> 1  1  Ada
#> 2  2  Bea

When to disable simplification: when array elements mix types that cannot fit a single R vector, when objects in the same array have different keys and you need the original shape preserved, and when round-tripping JSON without altering structure. For analytics work where you want a clean table, leave the defaults alone.

Nested JSON to data.frame: jsonlite::flatten()

JSON in the wild is rarely flat — API responses nest objects three or four levels deep, and fromJSON brings that nesting through as list-columns. The standard cleanup is jsonlite::flatten(), which lifts nested object fields into dotted column names on the parent data.frame.

json <- '[
  {
    "id": 1,
    "name": "Ada",
    "address": { "city": "London", "zip": "WC1" }
  },
  {
    "id": 2,
    "name": "Bea",
    "address": { "city": "Paris", "zip": "75001" }
  }
]'

df <- fromJSON(json)
str(df)
# 'data.frame': 2 obs. of 3 variables:
#  $ id     : int  1 2
#  $ name   : chr  "Ada" "Bea"
#  $ address:'data.frame': 2 obs. of 2 variables
#   ..$ city: chr  "London" "Paris"
#   ..$ zip : chr  "WC1" "75001"

flat <- jsonlite::flatten(df)
str(flat)
# 'data.frame': 2 obs. of 4 variables:
#  $ id          : int  1 2
#  $ name        : chr  "Ada" "Bea"
#  $ address.city: chr  "London" "Paris"
#  $ address.zip : chr  "WC1" "75001"

flatten() only unpacks nested data.frame columns (objects inside the array). It leaves list-columns of arrays alone — those need a second pass with tidyr::unnest_longer() (for ragged arrays, one row per element) ortidyr::unnest_wider() (for fixed-shape arrays, columns by index).

# Combined pattern: flatten + unnest for tags array
library(tidyr)

df <- fromJSON('[
  {"id":1,"name":"Ada","tags":["math","physics"]},
  {"id":2,"name":"Bea","tags":["chemistry"]}
]')

df %>%
  tidyr::unnest_longer(tags)
#   id name tags
# 1  1  Ada math
# 2  1  Ada physics
# 3  2  Bea chemistry

For deeply irregular JSON (every document has a different shape, common in event-tracking dumps), tidyjson::spread_all() from the tidyjson package walks the document tree and produces a tidy long frame keyed by JSON path — usually a better starting point than fighting flatten for irregular data.

Streaming JSON: jsonlite::stream_in / stream_out for NDJSON

NDJSON (newline-delimited JSON) is the only practical streaming JSON format in R. A regular JSON document is a single tree that must be parsed as a unit; NDJSON puts one JSON object per line, so a parser can consume records one at a time without ever holding the whole file in memory. See our JSON Linesguide for the format itself; this section covers the R-side APIs.

# Reading: stream_in consumes a connection in chunks
events <- jsonlite::stream_in(file("events.ndjson"))
nrow(events)
# [1] 1500000

# Gzipped NDJSON — wrap with gzfile
events <- jsonlite::stream_in(gzfile("events.ndjson.gz"))

# Larger than RAM? Pass a handler — it gets called once per chunk
jsonlite::stream_in(
  file("huge.ndjson"),
  handler = function(chunk) {
    # process chunk (a data.frame) without accumulating
    upload_to_db(chunk)
  },
  pagesize = 5000
)

stream_out is the writer counterpart. It serializes a data.frame one row per line, no enclosing array — exactly the format that log aggregators, ML training pipelines, and event-streaming systems expect.

# Writing a frame to NDJSON
jsonlite::stream_out(my_df, file("output.ndjson"))

# stream_out also flushes incrementally — safe for huge frames
con <- file("rolling.ndjson", open = "wb")
jsonlite::stream_out(my_df, con)
close(con)

For files larger than RAM where even the per-chunk accumulator is too heavy, arrow::read_json_arrow() reads NDJSON into a memory-mapped Arrow Table and lets you operate without copying into the R heap — the standard path for files in the tens of gigabytes.

tidyverse integration: tibble::as_tibble, tidyjson, jsonify

jsonlite returns base R data.frames, but most modern R code lives in the tidyverse, which prefers tibbles. The conversion is one call:

library(tibble)
library(dplyr)

df <- fromJSON("users.json") |>
  jsonlite::flatten() |>
  tibble::as_tibble()

df |>
  filter(address.city == "London") |>
  select(id, name, address.zip)

For irregular JSON where the rectangular data.frame assumption breaks, the tidyjson package wraps jsonlite with a pipe-friendly API that walks JSON trees and emits tidy long frames. Each row is one terminal value in the JSON document, keyed by document ID and JSON path.

library(tidyjson)

'[{"id":1,"meta":{"verified":true,"score":42}}]' |>
  as.tbl_json() |>
  spread_all()
# # A tbl_json: 1 x 4 tibble with a "JSON" attribute
#   document.id meta.verified meta.score id
#         <int>         <lgl>      <dbl> <dbl>
# 1           1          TRUE         42     1

For pure performance, jsonify from David Cooley is an Rcpp-backed alternative with a near-identical API to jsonlite. Its from_json and to_json functions are often 2–4x faster on large payloads in benchmarks. The package is mature but has a smaller community — use it after profiling shows JSON serialization is your bottleneck, not before.

If your work crosses language boundaries, see our reference on JSON across languagesfor how R's type mapping compares to Python, Ruby, and Go.

Writing JSON for APIs: handling NA, NULL, dates

The trickiest part of generating JSON from R is the impedance mismatch between R's type system (vectorized, distinguishing NA from NULL) and JSON's (scalar-first, only null). Three knobs in toJSONcover most of the cases that bite production code.

# NA in a vector → JSON null (default)
toJSON(c(1, NA, 3), auto_unbox = TRUE)
# [1,null,3]

# Explicit: same behavior
toJSON(c(1, NA, 3), auto_unbox = TRUE, na = "null")

# NULL inside a list → key dropped entirely
toJSON(list(a = 1, b = NULL, c = 3), auto_unbox = TRUE)
# {"a":1,"c":3}

# To keep b present with null value, use NA instead
toJSON(list(a = 1, b = NA, c = 3), auto_unbox = TRUE)
# {"a":1,"b":null,"c":3}

Dates and times are the other classic snag. R has Date and POSIXct; JSON has no date type. jsonlitedefaults to ISO 8601 strings, which is what most modern APIs expect.

toJSON(Sys.Date(), auto_unbox = TRUE)
# "2026-05-23"

toJSON(Sys.time(), auto_unbox = TRUE)
# "2026-05-23 14:30:00"

# Explicit: force ISO 8601 with timezone
toJSON(Sys.time(), auto_unbox = TRUE, POSIXt = "ISO8601")
# "2026-05-23T14:30:00+00:00"

# Epoch milliseconds (some legacy APIs want this)
toJSON(Sys.time(), auto_unbox = TRUE, POSIXt = "epoch")
# 1716475800000

Logicals serialize as JSON true/false with no extra config. Factors serialize as their character levels by default; set factor = "integer" if you need the underlying integer codes for a downstream consumer that expects them.

Reading JSON from URLs and pagination

For simple one-shot reads, fromJSON(url) works — it issues a GET and parses the body. For anything with authentication, pagination, retries, or non-GET methods, use httr2 (the modern HTTP client) with jsonlite.

library(httr2)
library(jsonlite)

# Build the request — composable pipeline
resp <- request("https://api.example.com/users") |>
  req_auth_bearer_token(Sys.getenv("API_TOKEN")) |>
  req_url_query(page = 1, limit = 100) |>
  req_headers("Accept" = "application/json") |>
  req_retry(max_tries = 3) |>
  req_perform()

# Parse the response body as JSON
data <- resp_body_json(resp, simplifyVector = TRUE)
str(data)

For POST requests with a JSON body, req_body_json() calls toJSON with auto_unbox = TRUE internally, so you pass an R list and the wire format is correct.

# POST with JSON body
resp <- request("https://api.example.com/users") |>
  req_auth_bearer_token(Sys.getenv("API_TOKEN")) |>
  req_body_json(list(
    name = "Ada",
    email = "ada@example.com",
    tags = c("admin", "beta")
  )) |>
  req_perform()

resp_body_json(resp, simplifyVector = TRUE)

Paginated APIs get the most leverage from httr2.req_perform_iterative() takes a callback that returns the next request given the last response, and loops until that callback returns NULL. You get back a list of responses to process in one pass.

# Iterate through all pages, then bind into one frame
all_pages <- request("https://api.example.com/users") |>
  req_url_query(limit = 100) |>
  req_perform_iterative(
    next_req = iterate_with_offset("page", start = 1)
  ) |>
  resps_data(\(r) resp_body_json(r, simplifyVector = TRUE)$results) |>
  dplyr::bind_rows()

For related patterns across other languages, see our guides on Parse JSON in Python, Parse JSON in Ruby, Parse JSON in Perl, and JSON to pandas DataFrame for the Python equivalent of the flattening workflow.

Key terms

jsonlite
The default R JSON package, maintained by Jeroen Ooms. Auto-coerces JSON arrays into R vectors and arrays-of-objects into data.frames. Used by tidyverse, httr2, plumber, and most CRAN packages that touch JSON.
fromJSON / toJSON
The two main jsonlite entry points. fromJSON reads from a file path, URL, connection, or string; toJSON serializes an R object to a JSON character string. Both accept simplification and formatting flags.
simplifyVector
The flag (default TRUE) that controls whether jsonlite collapses homogeneous JSON arrays into R atomic vectors. With it off, every array stays a list and is indexed with [[i]] instead of [i].
simplifyDataFrame
The flag (default TRUE) that turns a JSON array of objects into an R data.frame. The most useful simplification for analytics work — it is the difference between getting a table back and a list-of-lists.
jsonlite::flatten()
Function that lifts nested data.frame columns into dotted column names on the parent frame (address.city, address.zip). Operates only on nested objects; list-columns of arrays require tidyr::unnest_*.
stream_in / stream_out
jsonlite's NDJSON streaming reader and writer. Process one JSON object per line in chunks, without loading the full file into memory. The standard tool for log files, analytics events, and ML datasets that exceed RAM.
auto_unbox
The toJSON flag (default FALSE, but you almost always want it TRUE) that emits length-1 R vectors as JSON scalars instead of one-element arrays. Without it, list(id = 1) serializes as {"id":[1]}, which surprises every non-R API consumer.
NDJSON
Newline-delimited JSON — one JSON object per line, no enclosing array. The format that makes streaming and incremental processing possible. See our JSON Lines guide.

Frequently asked questions

Which R package should I use for JSON?

For new code, use jsonlite. It is maintained by Jeroen Ooms, ships with most R distributions on Linux and macOS via system packages, and is the package the tidyverse, plumber, httr2, and most CRAN packages depend on internally. Its defining feature is automatic coercion from JSON arrays to R vectors and data.frames, which is what most analysts actually want. RJSONIO is the older option from the pre-jsonlite era — still on CRAN, still maintained, but the API is closer to the JSON wire format and you do the vector/data.frame conversion yourself. rjson is the simplest of the three: pure C, very fast for small payloads, but no auto-simplification and no streaming. jsonify is a newer Rcpp-based package that benchmarks faster than jsonlite for round-trips and is worth a look when JSON parsing dominates your profile. Default to jsonlite unless a benchmark or a legacy codebase points elsewhere.

How do I convert nested JSON into a data.frame?

jsonlite::fromJSON with simplifyDataFrame = TRUE (the default) already converts top-level arrays of objects into a data.frame, but nested objects come back as list-columns. Use jsonlite::flatten() on the data.frame to lift nested object fields into dotted column names: data <- fromJSON(json); flat <- jsonlite::flatten(data). For arrays inside objects, flatten() leaves them as list-columns — you typically tidyr::unnest_longer() those next. For deeply irregular nesting, tidyjson::spread_all() gives you a tidy long-format frame keyed by document path, which you then pivot. The pattern most data work converges on is: fromJSON → flatten → tibble::as_tibble → tidyr::unnest_* for any remaining list-columns. Test on a small sample first; auto-coercion can surprise you when one object in an array has a missing field.

What does simplifyVector do in jsonlite?

simplifyVector controls whether jsonlite collapses a homogeneous JSON array into an R atomic vector. It defaults to TRUE. With it on, [1, 2, 3] becomes c(1, 2, 3) — an integer or numeric vector you can index with [i]. With it off (simplifyVector = FALSE), the same JSON returns list(1, 2, 3) — a list you index with [[i]]. The simplified vector is almost always what you want for analytics, plotting, and arithmetic. Turn it off when you need to preserve the wire format exactly — for example, when round-tripping JSON through R without altering structure, or when array elements have mixed types that R cannot represent in a single vector. The related flags simplifyDataFrame (arrays of objects → data.frame) and simplifyMatrix (nested numeric arrays → matrix) follow the same idea: opt-in convenience that you disable when you need raw fidelity.

How do I read NDJSON / JSON Lines in R?

Use jsonlite::stream_in() with a file connection. NDJSON is one JSON object per line — a natural fit for streaming because each record can be parsed independently. stream_in reads, parses, and accumulates rows into a data.frame in chunks without loading the entire file into memory: data <- jsonlite::stream_in(file("events.ndjson")). Pair it with stream_out() to write large frames back: jsonlite::stream_out(my_df, file("output.ndjson")). Both accept a handler function for in-stream processing — useful when the file is too big even for the accumulated data.frame. Set pagesize to control the chunk size (default 500 records). For gzipped NDJSON, wrap the connection: stream_in(gzfile("events.ndjson.gz")). See our guide on JSON Lines for the format itself and why it dominates log shipping, analytics events, and ML datasets.

How does jsonlite handle NA versus null?

R has NA (missing value) and NULL (absent reference) as distinct concepts; JSON has only null. jsonlite maps them as follows on the way out (toJSON): NA in atomic vectors and data.frame columns becomes null by default; NULL inside a list becomes the absent key — that is, the field is dropped from the output entirely. On the way in (fromJSON), every JSON null becomes NA inside vectors and data.frames, and stays as NULL inside lists that were not simplified. You can change the NA serialization with na = "null" (default), "string" (writes the literal string "NA" — usually wrong), or "null" explicitly. For API payloads where the receiver needs the key present with a null value (not missing), prefer NA inside a vector or use list(field = list()) carefully. Test against the consumer — some JSON APIs treat missing key and null differently.

How do I send JSON to an API from R?

Use httr2 with jsonlite. httr2 is the current generation of HTTP client for R (httr is the older sibling, still on CRAN but in maintenance mode). Build a request with request(url), add the JSON body via req_body_json() — which calls toJSON internally with auto_unbox = TRUE so scalar fields serialize as JSON primitives instead of one-element arrays — and call req_perform(). The response object exposes resp_body_json() that parses the response back into an R list or data.frame. For authenticated endpoints, add req_auth_bearer_token() or req_headers(Authorization = "..."). For retries on transient failures, chain req_retry(max_tries = 3). For paginated APIs, req_perform_iterative() runs the loop for you with a next_req callback. The combination of httr2 plus jsonlite covers nearly every API integration scenario in R, including OAuth, file uploads, and streaming responses.

Can I stream a large JSON file in R?

Yes, but only if the file is NDJSON (one JSON object per line). A single large JSON document — for example, a top-level array of a million objects in one file — cannot be streamed by jsonlite, because the whole document must be parsed as one unit to validate the syntax tree. Two workarounds: first, ask the producer to emit NDJSON instead; almost every analytics pipeline can. Second, pre-split the file: use jq with the --compact-output flag and the .[] iterator to convert array-of-objects JSON into NDJSON (jq -c ".[]" big.json > big.ndjson) before reading with stream_in. For genuinely huge files, consider arrow::read_json_arrow() — it reads NDJSON into an Arrow Table with memory-mapped IO, which lets you operate on the data without ever loading it into the R heap. The Arrow path is the standard answer for files larger than RAM.

What's the difference between fromJSON and read_json?

jsonlite::fromJSON is the all-in-one reader: it accepts a file path, a URL, a connection, or a raw JSON string and applies the simplifyVector/simplifyDataFrame/simplifyMatrix coercion you ask for. It is what most code should call. jsonlite::read_json is the lower-level reader: it accepts a file path or connection only (no strings, no URLs), and it returns the raw parsed structure with simplification off by default — every JSON array stays a list, every object stays a named list. Use read_json when you want full fidelity to the wire format and plan to do the structural conversion yourself (often the case when the JSON shape is irregular). Use fromJSON for ordinary analytics work where you want a data.frame back. The toJSON / write_json pair mirrors this split: toJSON returns a string, write_json writes to disk.

Further reading and primary sources