JSON in R: jsonlite fromJSON, toJSON, httr2 API Calls & Streaming NDJSON

Q: How do I parse a JSON file in R?

Use jsonlite::fromJSON() with a file path, URL, or JSON string. For a local file: library(jsonlite); data <- fromJSON("data.json"). For a JSON array of objects, fromJSON() automatically converts the result to an R data.frame — the most convenient format for data analysis. For nested JSON, fromJSON() returns a list-of-data.frames structure. If you pass simplifyVector = FALSE, you get a pure nested R list without any automatic flattening, which is useful when the JSON structure has variable shapes that do not map cleanly to a rectangle. The function accepts file paths, URLs, and JSON strings directly — no separate file-reading step is required, unlike Python's json.load() workflow. For a 10,000-row JSON array, fromJSON() typically completes in under 200 ms on a modern machine.

Q: What does auto_unbox do in jsonlite::toJSON?

By default, jsonlite::toJSON() serializes every R vector as a JSON array, even if it contains a single element. This is technically correct because R has no scalar type — a single number is a length-1 vector. Without auto_unbox = TRUE, toJSON(list(id = 1L)) produces {"id":[1]} — the number is wrapped in an array. With auto_unbox = TRUE, toJSON(list(id = 1L), auto_unbox = TRUE) produces {"id":1}, matching what REST APIs and most JSON consumers expect. This is the source of the most common R JSON bug: generating arrays where APIs expect scalars. Always pass auto_unbox = TRUE when serializing data for API consumption. You can also use the unbox() helper on individual values — unbox(1L) — for selective unboxing when you need arrays for some fields and scalars for others.

Q: How do I fetch JSON from a REST API in R?

The modern approach uses the httr2 package: library(httr2); req <- request("https://api.example.com/data"); resp <- req_perform(req); data <- resp_body_json(resp). resp_body_json() parses the response body as JSON and returns a nested R list. For authenticated endpoints, pipe req_auth_bearer_token(req, token) before req_perform(). For POST requests with a JSON body, use req_body_json(req, list(key = "value"), auto_unbox = TRUE). httr2 raises errors automatically for HTTP 4xx/5xx responses — use req_error(req, is_error = function(resp) FALSE) to suppress automatic errors and handle status codes manually. The older httr package (GET(), POST()) is still widely used and works, but httr2's pipe-based API is cleaner and is the current recommendation for new code.

Q: How do I convert nested JSON to a flat data.frame in R?

jsonlite::fromJSON() with its default simplifyDataFrame = TRUE will automatically flatten one level of nesting into a data.frame with list-columns for deeper levels. For deeper flattening, use tidyr::unnest_wider() or tidyr::unnest_longer() from the tidyverse. For example: library(tidyverse); df as_tibble() |> unnest_wider(nested_column). Alternatively, jsonlite::flatten(df) flattens one level of list-columns using dot notation for column names — list(address = list(city = "NYC")) becomes address.city. For deeply nested JSON with irregular shapes, purrr::map_dfr() combined with a row-extraction function gives the most control. The tidyr approach works well when nesting is consistent; purrr::map() is better for variable-structure JSON.

Q: How do I stream large NDJSON files in R?

Use jsonlite::stream_in() with a file connection: library(jsonlite); con <- file("large.ndjson", open = "r"); df <- stream_in(con); close(con). stream_in() reads NDJSON (one JSON object per line) line-by-line and processes records in batches of 500 by default, avoiding loading the full file into memory. This is essential for NDJSON files larger than 500 MB. You can control batch size with the pagesize argument: stream_in(con, pagesize = 1000). For each batch, stream_in() calls a handler function if provided — stream_in(con, handler = function(batch) { ... }) — enabling processing without accumulating all results. The companion stream_out() function writes a data.frame to NDJSON format for downstream consumption.

Q: What is the difference between jsonlite and rjson in R?

jsonlite and rjson differ in three key areas: automatic data.frame conversion, streaming support, and parsing speed. jsonlite::fromJSON() automatically converts JSON arrays of objects to R data.frames and handles simplification of JSON arrays to R vectors — critical for data analysis workflows. rjson::fromJSON() always returns nested R lists, requiring manual conversion. rjson is 2-3× faster than jsonlite for pure parsing because it does less post-processing, which matters for workloads parsing millions of small JSON objects per second. jsonlite supports NDJSON streaming via stream_in()/stream_out() and provides prettify()/minify() utilities — rjson has none of these. RJSONIO is a third option: similar to rjson in philosophy but older and generally slower. For data science workflows involving data.frames and dplyr, jsonlite is the right choice. For high-throughput JSON parsing where you control post-processing, rjson or the jsonify package (Rcpp-based) may be faster.

Q: How do I write JSON to a file in R?

Use jsonlite::write_json() for the simplest interface: library(jsonlite); write_json(df, "output.json", auto_unbox = TRUE, pretty = TRUE). This writes a data.frame or list to a JSON file. Alternatively, use toJSON() to produce a string and then writeLines(): json_str <- toJSON(df, auto_unbox = TRUE, pretty = TRUE); writeLines(json_str, "output.json"). For NDJSON output (one JSON object per line), use stream_out(): con <- file("output.ndjson", open = "w"); stream_out(df, con); close(con). The pretty = TRUE argument adds 2-space indentation and newlines for human-readable output — omit it for compact production output. The digits argument controls number precision: digits = NA uses full R precision (15 significant digits), while digits = 6 is sufficient for most analytics use cases.

Q: How do I handle missing or null values in R JSON?

JSON null maps to R's NA when jsonlite converts JSON arrays to data.frames, and to R's NULL when it returns nested lists. This distinction matters: NA represents a missing value within a vector (preserving the vector's type), while NULL represents the absence of an object. When serializing with toJSON(), R's NA becomes JSON null by default for numeric NAs and null for logical NAs. To serialize NA as null explicitly, use na = "null" in toJSON(): toJSON(df, na = "null", auto_unbox = TRUE). To omit NA fields from the output entirely (useful for JSON APIs expecting sparse objects), use na = "string" is not the right choice — instead, remove NA columns before serialization or filter with purrr::discard(). When reading JSON with missing fields, fromJSON() fills missing values with NA in data.frame output, maintaining rectangular structure across records.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 28, 2026

R handles JSON primarily through the jsonlite package — jsonlite::fromJSON("url_or_path") parses JSON from a URL, file path, or string directly into an R data.frame (for JSON arrays of objects) or nested list. jsonlite::toJSON(data, auto_unbox = TRUE) serializes R objects to JSON; the auto_unbox option is critical — without it, single scalar values are wrapped in JSON arrays. For HTTP API calls, httr2::req_perform(request) |> httr2::resp_body_json() fetches and parses JSON in one pipeline. For large NDJSON files, jsonlite::stream_in(file("data.ndjson")) reads line-by-line without memory-loading the full file.

This guide covers fromJSON/toJSON, the auto_unbox parameter, reading JSON APIs with httr2, working with nested JSON lists, streaming NDJSON, and converting JSON to data.frames for dplyr analysis.

fromJSON: Parsing JSON to data.frame and List

jsonlite::fromJSON() is R's primary JSON parser. Its defining feature is automatic simplification: a JSON array of flat objects becomes a data.frame, a JSON array of scalars becomes a vector, and a deeply nested JSON object becomes a nested list. Pass simplifyVector = FALSE to disable all simplification and get a pure nested list regardless of structure.

library(jsonlite)

# ── Parse from a local file ───────────────────────────────────────
df <- fromJSON("data.json")
# JSON array of objects → data.frame automatically

# ── Parse from a URL ──────────────────────────────────────────────
users <- fromJSON("https://jsonplaceholder.typicode.com/users")
class(users)     # "data.frame"
nrow(users)      # 10 (one row per JSON object)
names(users)     # c("id", "name", "username", "email", ...)

# ── Parse from a JSON string ──────────────────────────────────────
json_str <- '[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"}]'
people <- fromJSON(json_str)
#   id  name
# 1  1 Alice
# 2  2   Bob

# ── Scalar array → R vector ──────────────────────────────────────
scores <- fromJSON("[10, 20, 30]")
class(scores)  # "integer" (not a list — simplified automatically)

# ── JSON object → named list ──────────────────────────────────────
config <- fromJSON('{"host":"localhost","port":5432,"ssl":true}')
config$host   # "localhost"
config$port   # 5432 (integer)
config$ssl    # TRUE (logical)

# ── Disable simplification — always get a list ────────────────────
raw_list <- fromJSON("data.json", simplifyVector = FALSE)
class(raw_list)       # "list"
class(raw_list[[1]])  # "list" — each element stays as a list

# ── Key simplification parameters ─────────────────────────────────
fromJSON(json,
  simplifyVector    = TRUE,  # arrays → atomic vectors (default)
  simplifyDataFrame = TRUE,  # arrays of objects → data.frame (default)
  simplifyMatrix    = TRUE,  # arrays of equal-length arrays → matrix (default)
  flatten           = FALSE  # flatten nested data.frames (default FALSE)
)

# ── Check what fromJSON returns ───────────────────────────────────
result <- fromJSON("https://api.github.com/repos/tidyverse/ggplot2")
class(result)       # "list" (single object, not an array)
result$stargazers_count  # integer — star count
result$owner$login       # "tidyverse" — nested field access

The flatten = TRUE argument merges nested data.frame columns one level deep using dot notation. A column named address containing a sub-data.frame with city and zip becomes two columns: address.city and address.zip. This is equivalent to calling jsonlite::flatten(df) on the result after parsing, and is the fastest path from JSON to a fully rectangular data.frame without additional dplyr or tidyr steps.

toJSON: Serializing R Objects with auto_unbox

jsonlite::toJSON() converts R objects to JSON strings. The most important parameter is auto_unbox = TRUE, which prevents length-1 R vectors from being serialized as JSON arrays. R has no scalar type — every value is a vector — so without this option, list(id = 1L) becomes {"id":[1]} instead of {"id":1}. This mismatch causes silent bugs when calling REST APIs.

library(jsonlite)

# ── The auto_unbox problem ────────────────────────────────────────
obj <- list(id = 1L, name = "Alice", active = TRUE)

toJSON(obj)
# {"id":[1],"name":["Alice"],"active":[true]}  ← arrays everywhere!

toJSON(obj, auto_unbox = TRUE)
# {"id":1,"name":"Alice","active":true}  ← correct for most APIs

# ── Serialize a data.frame ────────────────────────────────────────
df <- data.frame(
  id    = 1:3,
  name  = c("Alice", "Bob", "Carol"),
  score = c(95.5, 87.2, 91.8)
)

toJSON(df, auto_unbox = TRUE)
# [{"id":1,"name":"Alice","score":95.5},{"id":2,"name":"Bob","score":87.2}, ...]

# ── Pretty printing ───────────────────────────────────────────────
toJSON(df, auto_unbox = TRUE, pretty = TRUE)
# [
#   {"id": 1, "name": "Alice", "score": 95.5},
#   ...
# ]

# ── Write to file ─────────────────────────────────────────────────
write_json(df, "output.json", auto_unbox = TRUE, pretty = TRUE)
# Equivalent to: writeLines(toJSON(df, auto_unbox=TRUE, pretty=TRUE), "output.json")

# ── Control number precision ──────────────────────────────────────
toJSON(list(pi = pi), digits = 4, auto_unbox = TRUE)
# {"pi":3.1416}

toJSON(list(pi = pi), digits = NA, auto_unbox = TRUE)
# {"pi":3.14159265358979}  ← full R double precision

# ── Selective unboxing with unbox() ──────────────────────────────
# When you need arrays for some fields and scalars for others:
obj2 <- list(
  id    = unbox(42L),         # force scalar
  tags  = c("r", "json"),     # keep as array
  meta  = unbox(list(v = 1L)) # nested scalar
)
toJSON(obj2)
# {"id":42,"tags":["r","json"],"meta":{"v":[1]}}

# ── NA handling ───────────────────────────────────────────────────
df_na <- data.frame(x = c(1, NA, 3))
toJSON(df_na, auto_unbox = TRUE)
# [{"x":1},{"x":null},{"x":3}]  ← NA → null by default

# Omit NA (na = "null" is default; other option: na = "string")
# To exclude keys with NA values, filter before serializing:
clean_list <- Filter(Negate(is.na), obj)
toJSON(clean_list, auto_unbox = TRUE)

# ── Formatting utilities ──────────────────────────────────────────
raw_json <- '{"b":2,"a":1}'
jsonlite::prettify(raw_json)       # adds indentation + newlines
jsonlite::minify(raw_json)         # removes whitespace → {"b":2,"a":1}

When building JSON for API requests, always test with cat(toJSON(obj, auto_unbox = TRUE)) to inspect the exact JSON string before sending. The cat() call removes R's surrounding quotes from the string output, showing the actual JSON characters. A common debugging pattern is: jsonlite::prettify(toJSON(body, auto_unbox = TRUE)) to get formatted, human-readable output without writing a file.

Fetching JSON APIs with httr2

The httr2 package provides a modern pipe-based interface for HTTP requests. The core workflow is: build a request with request(), configure it with req_*() functions, execute with req_perform(), and parse the JSON response with resp_body_json(). httr2 automatically throws errors on HTTP 4xx/5xx responses and handles redirects transparently.

library(httr2)

# ── GET request ───────────────────────────────────────────────────
resp <- request("https://jsonplaceholder.typicode.com/posts/1") |>
  req_perform()

post <- resp_body_json(resp)
post$title   # "sunt aut facere repellat provident occaecati..."
post$userId  # 1

# ── GET with query parameters ─────────────────────────────────────
resp <- request("https://jsonplaceholder.typicode.com/posts") |>
  req_url_query(userId = 1, _limit = 5) |>
  req_perform()

posts <- resp_body_json(resp, simplifyVector = TRUE)
# simplifyVector = TRUE passes through to jsonlite for auto data.frame

# ── Authenticated GET (Bearer token) ─────────────────────────────
resp <- request("https://api.github.com/user") |>
  req_auth_bearer_token(Sys.getenv("GITHUB_TOKEN")) |>
  req_headers(Accept = "application/vnd.github+json") |>
  req_perform()

user <- resp_body_json(resp)
user$login   # your GitHub username

# ── POST with a JSON body ─────────────────────────────────────────
body <- list(
  title  = "New Post",
  body   = "Post content here.",
  userId = 1L
)

resp <- request("https://jsonplaceholder.typicode.com/posts") |>
  req_body_json(body, auto_unbox = TRUE) |>
  req_perform()

resp_status(resp)        # 201
created <- resp_body_json(resp)
created$id               # 101 (auto-assigned ID)

# ── Error handling ────────────────────────────────────────────────
resp <- request("https://api.example.com/data") |>
  req_error(is_error = function(resp) FALSE) |>  # don't throw on errors
  req_perform()

if (resp_status(resp) >= 400) {
  err <- resp_body_json(resp)
  message("API error: ", err$message)
} else {
  data <- resp_body_json(resp)
}

# ── Retry with exponential backoff ────────────────────────────────
resp <- request("https://api.example.com/unstable") |>
  req_retry(max_tries = 3, backoff = ~ 2 ^ .x) |>
  req_perform()

# ── Older httr package (still widely used) ────────────────────────
# library(httr)
# resp <- GET("https://api.example.com/data",
#             add_headers(Authorization = paste("Bearer", token)))
# data <- content(resp, as = "parsed")  # auto-detects JSON

Pass simplifyVector = TRUE to resp_body_json() to enable jsonlite's automatic data.frame conversion — httr2 passes this argument through to jsonlite internally. Without it, resp_body_json() returns a nested R list. The req_throttle() function limits requests per second to avoid rate-limiting: req_throttle(req, rate = 10) allows at most 10 requests per second.

Working with Nested JSON Lists

When fromJSON() returns a nested list (either because simplifyVector = FALSE or because the JSON has irregular structure), the tidyverse provides the most ergonomic tools for extraction and transformation. purrr::map() applies a function over list elements; tidyr::unnest_wider() spreads list-columns into separate columns.

library(jsonlite)
library(purrr)
library(tidyr)
library(dplyr)

# ── Nested JSON from a real API (GitHub repos) ────────────────────
json_text <- '{
  "name": "ggplot2",
  "owner": {"login": "tidyverse", "id": 22032646},
  "topics": ["r", "visualization", "ggplot2"],
  "license": {"key": "mit", "name": "MIT License"}
}'

repo <- fromJSON(json_text)

# Direct nested access with $ and [[]]
repo$owner$login     # "tidyverse"
repo$license[["key"]] # "mit"
repo$topics[2]        # "visualization" (1-indexed in R)

# ── Safe deep access with purrr::pluck() ─────────────────────────
# pluck() returns NULL (not an error) if path doesn't exist
pluck(repo, "owner", "login")   # "tidyverse"
pluck(repo, "missing", "field") # NULL — no error

# ── Extract fields from a list of nested objects ──────────────────
# Simulated list of repos (fromJSON with simplifyVector = FALSE)
repos <- fromJSON("https://api.github.com/orgs/tidyverse/repos",
                  simplifyVector = FALSE)

# Extract a single field from every element
repo_names <- map_chr(repos, "name")
# c("ggplot2", "dplyr", "tidyr", ...)

star_counts <- map_int(repos, "stargazers_count")
# c(6721, 4832, ...) — numeric vector of star counts

# ── Build a data.frame from nested lists ─────────────────────────
repo_df <- map_dfr(repos, function(r) {
  data.frame(
    name    = r$name %||% NA_character_,
    stars   = r$stargazers_count %||% NA_integer_,
    lang    = r$language %||% NA_character_,
    license = pluck(r, "license", "key", .default = NA_character_),
    stringsAsFactors = FALSE
  )
})

# ── Unnest list-columns from fromJSON output ──────────────────────
# When fromJSON auto-creates list-columns for nested objects:
users_df <- fromJSON("https://jsonplaceholder.typicode.com/users")
class(users_df$address)   # "data.frame" (nested data.frame column!)

# The nested address data.frame is already accessible:
users_df$address$city     # c("Gwenborough", "Wisokyburgh", ...)

# To fully flatten into a single wide data.frame:
flat_users <- users_df |>
  jsonlite::flatten() |>       # dot-notation column names
  as_tibble()

names(flat_users)[1:8]
# "id" "name" "username" "email" "address.street" "address.suite"
# "address.city" "address.zipcode"

# ── tidyr unnesting for list-columns ─────────────────────────────
library(tibble)
df <- tibble(
  id   = 1:3,
  meta = list(
    list(tags = c("a", "b"), score = 90),
    list(tags = c("c"),      score = 85),
    list(tags = c("d", "e"), score = 92)
  )
)

# Widen list-column (each key becomes a column)
df |> unnest_wider(meta)
# id tags      score
#  1 <chr [2]>  90
#  2 <chr [1]>  85
#  3 <chr [2]>  92

The %||% operator (from rlang or definable as \`%||%\` <- function(a, b) if (!is.null(a)) a else b) is invaluable for handling missing fields in nested lists — it returns the left-hand side unless it is NULL, then falls back to the right-hand side. This is the R equivalent of JavaScript's nullish coalescing operator for list traversal.

stream_in: Processing Large NDJSON Files

NDJSON (Newline-Delimited JSON) stores one JSON object per line, enabling line-by-line streaming. jsonlite::stream_in() reads NDJSON files in batches without loading the full file into memory — critical for files larger than available RAM. The companion stream_out() writes a data.frame to NDJSON format.

library(jsonlite)

# ── Read an NDJSON file ────────────────────────────────────────────
# Each line is a separate JSON object:
# {"id":1,"name":"Alice","score":95}
# {"id":2,"name":"Bob","score":87}

con <- file("large.ndjson", open = "r")
df <- stream_in(con)
close(con)
# df is a data.frame with one row per line

# ── Control batch size ────────────────────────────────────────────
con <- file("large.ndjson", open = "r")
df <- stream_in(con, pagesize = 1000)  # process 1000 lines at a time
close(con)

# ── Process without accumulating — use a handler function ─────────
# For truly large files where you don't want the full data.frame in RAM:
results <- list()
con <- file("large.ndjson", open = "r")
stream_in(con, handler = function(batch) {
  # batch is a data.frame of pagesize rows
  filtered <- batch[batch$score > 90, ]
  results[[length(results) + 1]] <<- filtered
}, pagesize = 500)
close(con)

final_df <- do.call(rbind, results)  # combine all filtered batches

# ── Write NDJSON ──────────────────────────────────────────────────
df_to_write <- data.frame(
  id    = 1:5,
  value = c(10.1, 20.2, 30.3, 40.4, 50.5)
)

con <- file("output.ndjson", open = "w")
stream_out(df_to_write, con)
close(con)
# Produces:
# {"id":1,"value":10.1}
# {"id":2,"value":20.2}
# ...

# ── Read gzip-compressed NDJSON ───────────────────────────────────
# Gunzip on the fly with gzcon():
con <- gzcon(file("large.ndjson.gz", open = "rb"))
df <- stream_in(con)
close(con)

# ── Combine with dplyr for analysis ──────────────────────────────
library(dplyr)
con <- file("events.ndjson", open = "r")
summary_df <- stream_in(con, pagesize = 5000) |>
  group_by(event_type) |>
  summarise(count = n(), avg_value = mean(value, na.rm = TRUE))
close(con)

# ── Performance note ──────────────────────────────────────────────
# stream_in() processes files line-by-line in R — it is not as fast
# as Python's ijson (C-based). For very large files (>5 GB), consider:
# 1. arrow::read_json_arrow() from the {arrow} package (fastest)
# 2. Pre-filtering with jq at the shell level before importing to R

For production pipelines processing files larger than a few gigabytes, consider the arrow package's read_json_arrow() function, which uses Apache Arrow's C++ JSON reader — significantly faster than jsonlite's R-level streaming and capable of reading 1 GB NDJSON files in seconds. Arrow integrates directly with dplyr via arrow::to_dplyr() for lazy evaluation without materializing the full dataset into memory.

Converting JSON to data.frame for dplyr

The most common R JSON workflow is parsing a JSON API response into a tidy data.frame and then analyzing it with dplyr. fromJSON()'s automatic simplification handles flat JSON arrays; for nested structures, jsonlite::flatten() and tidyr::unnest_*() are the standard tools.

library(jsonlite)
library(dplyr)
library(tidyr)
library(httr2)

# ── JSON API → data.frame → dplyr ────────────────────────────────
posts <- fromJSON("https://jsonplaceholder.typicode.com/posts")
# Automatically a data.frame: 100 rows × 4 columns (userId, id, title, body)

posts |>
  filter(userId == 1) |>
  select(id, title) |>
  arrange(id)

# ── Parse a local JSON array file ─────────────────────────────────
# sales.json: [{"region":"North","q1":120,"q2":95},{"region":"South","q1":88,"q2":103},...]
sales <- fromJSON("sales.json")

sales |>
  mutate(total = q1 + q2) |>
  arrange(desc(total)) |>
  head(5)

# ── Flatten nested JSON for dplyr ────────────────────────────────
users <- fromJSON("https://jsonplaceholder.typicode.com/users",
                  flatten = TRUE)
# flatten = TRUE collapses address/geo/company sub-objects
# columns: id, name, username, email, address.street, address.city,
#          address.geo.lat, address.geo.lng, company.name, ...

users |>
  select(name, address.city, company.name) |>
  filter(!is.na(company.name))

# ── Handle API responses with pagination ─────────────────────────
fetch_all_pages <- function(base_url, per_page = 100) {
  all_data <- list()
  page <- 1
  repeat {
    resp <- request(base_url) |>
      req_url_query(page = page, per_page = per_page) |>
      req_perform()

    batch <- resp_body_json(resp, simplifyVector = TRUE)
    if (length(batch) == 0) break

    all_data[[page]] <- batch
    page <- page + 1
    if (nrow(batch) < per_page) break
  }
  bind_rows(all_data)
}

# ── JSON string column → nested data ─────────────────────────────
# Sometimes APIs return JSON-as-strings inside a data.frame column:
df <- data.frame(
  id       = 1:3,
  metadata = c('{"tags":["r","json"]}',
               '{"tags":["python"]}',
               '{"tags":["r","tidyverse","dplyr"]}'),
  stringsAsFactors = FALSE
)

# Parse each JSON string and extract fields:
df |>
  mutate(
    parsed_meta = lapply(metadata, fromJSON),
    tags        = sapply(parsed_meta, function(m) paste(m$tags, collapse = ", "))
  ) |>
  select(id, tags)

# ── Wide JSON → long format (for ggplot2) ────────────────────────
# JSON: [{"year":2023,"q1":100,"q2":110,"q3":105,"q4":120}, ...]
quarterly <- fromJSON("quarterly_sales.json")

quarterly_long <- quarterly |>
  tidyr::pivot_longer(
    cols      = starts_with("q"),
    names_to  = "quarter",
    values_to = "sales"
  )
# Ready for ggplot2: geom_line(aes(x=quarter, y=sales, color=region))

The bind_rows() function from dplyr handles binding lists of data.frames with different column sets — missing columns are filled with NA. This makes it ideal for combining paginated API responses or NDJSON batches where different records may have different optional fields. Combine with janitor::clean_names() to automatically convert camelCase API field names (like userId, createdAt) to snake_case for R conventions.

jsonlite vs rjson vs RJSONIO: Performance Comparison

Three packages dominate R JSON parsing: jsonlite (the ecosystem default), rjson (fastest pure parser), and RJSONIO (older, RGtk2-derived). A fourth option, jsonify, uses Rcpp for C-level speed. Choose based on whether you need data.frame conversion, streaming, or raw throughput.

# ── Package comparison ────────────────────────────────────────────

# jsonlite — the recommended default
library(jsonlite)
df   <- fromJSON("data.json")                  # → data.frame
json <- toJSON(df, auto_unbox = TRUE)          # → JSON string
stream_in(file("large.ndjson"))               # NDJSON streaming ✓
prettify(json)                                 # format utilities ✓
# fromJSON can accept: file path, URL, JSON string, connection

# rjson — fastest pure parser, always returns R lists
library(rjson)
lst  <- rjson::fromJSON(file = "data.json")   # → nested list always
json <- rjson::toJSON(lst)                    # → JSON string
# No data.frame conversion, no streaming, no prettify
# 2-3× faster than jsonlite for pure list output
# Best for: parsing many small JSON objects at high throughput

# RJSONIO — similar to rjson, older API
library(RJSONIO)
lst  <- RJSONIO::fromJSON("data.json")        # → list
json <- RJSONIO::toJSON(lst, pretty = TRUE)   # → JSON string
# Slightly different null handling (NULL not NA for missing values)
# Less maintained; generally prefer jsonlite or rjson

# jsonify — Rcpp-based, fastest for certain workloads
# install.packages("jsonify")
library(jsonify)
lst  <- from_json("data.json")               # → list (like rjson)
json <- to_json(df)                          # → JSON string
# Competitive with rjson on speed; no streaming

# ── Benchmark (approximate, varies by machine and data) ───────────
# For parsing a 10,000-row JSON array to a data.frame:
# jsonlite::fromJSON():  ~250 ms (includes data.frame conversion)
# rjson::fromJSON():     ~90 ms  (returns list, no conversion)
# jsonify::from_json():  ~80 ms  (returns list, Rcpp)
# arrow::read_json_arrow(): ~20 ms (Apache Arrow C++ reader)

# ── Decision matrix ───────────────────────────────────────────────
# Use jsonlite when:
#   - You need automatic data.frame output
#   - Working with URLs or NDJSON streaming
#   - Using tidyverse packages (they assume jsonlite)
#
# Use rjson when:
#   - Pure parsing speed matters most
#   - You'll manually convert list → data.frame anyway
#   - Parsing many small objects (e.g., log processing)
#
# Use arrow::read_json_arrow() when:
#   - Files are >1 GB
#   - You want lazy evaluation via Arrow / dplyr
#   - Memory is constrained

# ── Checking installed packages ───────────────────────────────────
installed.packages()[c("jsonlite","rjson","RJSONIO","jsonify"), "Version"]

# ── NULL vs NA: a key difference between packages ─────────────────
# JSON: {"name": "Alice", "age": null}
jsonlite::fromJSON('{"name":"Alice","age":null}')$age  # NA (in data.frame context)
rjson::fromJSON('{"name":"Alice","age":null}')$age     # NULL
# jsonlite maps JSON null to NA for data.frame compatibility
# rjson faithfully maps JSON null to R NULL

For most R users and data science workflows, jsonlite is the correct choice — it is what httr, httr2, rvest, and plumber all use internally. If you are building a high-throughput data pipeline that parses millions of JSON objects per second and does not need automatic data.frame conversion, rjson or jsonify provides measurable speedups. The arrow package's JSON reader is the clear winner for files over 500 MB.

Key Terms

fromJSON: jsonlite::fromJSON(txt, simplifyVector = TRUE, simplifyDataFrame = TRUE, simplifyMatrix = TRUE, flatten = FALSE) is R's primary JSON parser. It accepts a file path, URL, JSON string, or connection object. Its signature feature is automatic simplification: JSON arrays of objects become data.frame, JSON arrays of scalars become R vectors, and regular JSON objects become named lists. Setting simplifyVector = FALSE disables all simplification, returning a pure nested list regardless of structure. The flatten = TRUE argument further merges nested data.frame columns using dot notation.
toJSON / auto_unbox: jsonlite::toJSON(x, auto_unbox = FALSE, pretty = FALSE, digits = 4, na = "null") serializes R objects to JSON strings. The auto_unbox parameter is critical: without it, every R vector (including length-1 scalars) is serialized as a JSON array, because R has no scalar type. auto_unbox = TRUE automatically strips the array wrapper from length-1 vectors, producing "id":1 instead of "id":[1]. Use unbox(value) for selective per-value control. write_json() is a convenience wrapper that writes directly to a file path.
stream_in / stream_out: jsonlite::stream_in(con, handler = NULL, pagesize = 500) reads NDJSON (one JSON object per line) from a file connection in batches, without loading the full file into memory. This is the standard R approach for NDJSON files larger than available RAM. The optional handler function is called with each batch as a data.frame, enabling processing without accumulating results. stream_out(x, con) writes a data.frame to NDJSON format — the inverse operation. Both functions require an open file connection, not a path string.
NDJSON: NDJSON (Newline-Delimited JSON, also called JSON Lines or .jsonl) is a text format where each line is a valid, self-contained JSON value — typically a JSON object. Unlike a JSON array, NDJSON can be read line-by-line without parsing the entire document, enabling O(1) memory streaming. It is the standard format for large datasets (OpenAI training data, Elasticsearch bulk API, log files) because it supports append operations and parallel processing. R's jsonlite::stream_in() and stream_out() are the primary tools for NDJSON in R.
httr2: httr2 is the current recommended R package for HTTP requests, replacing the older httr package. It uses a pipe-based API: request(url) |> req_*(options) |> req_perform(). Key functions include req_auth_bearer_token() for authentication, req_body_json() for JSON POST bodies, req_url_query() for query parameters, and resp_body_json() for parsing JSON responses. httr2 raises errors automatically on 4xx/5xx responses unless disabled with req_error(), and supports retries with req_retry() and rate limiting with req_throttle().
simplifyVector / flatten: Two related but distinct concepts in jsonlite. simplifyVector (default TRUE) controls whether fromJSON() converts JSON arrays into R atomic vectors and data.frames — the core of jsonlite's convenience over rjson. flatten (default FALSE) is a post-processing step that merges nested data.frame columns one level deep using dot notation for column names. Setting flatten = TRUE in fromJSON() is equivalent to calling jsonlite::flatten(df) on the result — it collapses one level of nested list-columns into additional columns with names like address.city and address.zipcode.

FAQ

How do I parse a JSON file in R?

Use jsonlite::fromJSON() with a file path: library(jsonlite); data <- fromJSON("data.json"). For a JSON array of objects, fromJSON() automatically converts the result to an R data.frame — the most convenient format for data analysis. For nested JSON, fromJSON() returns a list-of-data.frames structure. If you pass simplifyVector = FALSE, you get a pure nested R list without any automatic flattening, which is useful when the JSON structure has variable shapes that do not map cleanly to a rectangle. The function accepts file paths, URLs, and JSON strings directly — no separate file-reading step is required, unlike Python's json.load() workflow. For a 10,000-row JSON array, fromJSON() typically completes in under 200 ms on a modern machine.

What does auto_unbox do in jsonlite::toJSON?

By default, jsonlite::toJSON() serializes every R vector as a JSON array, even if it contains a single element. This is technically correct because R has no scalar type — a single number is a length-1 vector. Without auto_unbox = TRUE, toJSON(list(id = 1L)) produces {"id":[1]} — the number is wrapped in an array. With auto_unbox = TRUE, toJSON(list(id = 1L), auto_unbox = TRUE) produces {"id":1}, matching what REST APIs and most JSON consumers expect. This is the source of the most common R JSON bug: generating arrays where APIs expect scalars. Always pass auto_unbox = TRUE when serializing data for API consumption. You can also use the unbox() helper on individual values — unbox(1L) — for selective unboxing when you need arrays for some fields and scalars for others.

How do I fetch JSON from a REST API in R?

The modern approach uses the httr2 package: library(httr2); req <- request("https://api.example.com/data"); resp <- req_perform(req); data <- resp_body_json(resp). resp_body_json() parses the response body as JSON and returns a nested R list. For authenticated endpoints, pipe req_auth_bearer_token(req, token) before req_perform(). For POST requests with a JSON body, use req_body_json(req, list(key = "value"), auto_unbox = TRUE). httr2 raises errors automatically for HTTP 4xx/5xx responses — use req_error(req, is_error = function(resp) FALSE) to suppress automatic errors and handle status codes manually. The older httr package (GET(), POST()) is still widely used and works well, but httr2's pipe-based API is cleaner and is the current recommendation for new code.

How do I convert nested JSON to a flat data.frame in R?

jsonlite::fromJSON() with its default simplifyDataFrame = TRUE automatically flattens one level of nesting into a data.frame with list-columns for deeper levels. For deeper flattening, use tidyr::unnest_wider() or tidyr::unnest_longer() from the tidyverse. For example: library(tidyverse); df <- fromJSON(json_string) |> as_tibble() |> unnest_wider(nested_column). Alternatively, jsonlite::flatten(df) flattens one level of list-columns using dot notation for column names — list(address = list(city = "NYC")) becomes address.city. For deeply nested JSON with irregular shapes, purrr::map_dfr() combined with a row-extraction function gives the most control. The tidyr approach works well when nesting is consistent; purrr::map() is better for variable-structure JSON.

How do I stream large NDJSON files in R?

Use jsonlite::stream_in() with a file connection: library(jsonlite); con <- file("large.ndjson", open = "r"); df <- stream_in(con); close(con). stream_in() reads NDJSON (one JSON object per line) line-by-line and processes records in batches of 500 by default, avoiding loading the full file into memory. This is essential for NDJSON files larger than 500 MB. You can control batch size with the pagesize argument: stream_in(con, pagesize = 1000). For each batch, stream_in() calls a handler function if provided — stream_in(con, handler = function(batch) { ... }) — enabling processing without accumulating all results. The companion stream_out() function writes a data.frame to NDJSON format for downstream consumption.

What is the difference between jsonlite and rjson in R?

jsonlite and rjson differ in three key areas: automatic data.frame conversion, streaming support, and parsing speed. jsonlite::fromJSON() automatically converts JSON arrays of objects to R data.frames and handles simplification of JSON arrays to R vectors — critical for data analysis workflows. rjson::fromJSON() always returns nested R lists, requiring manual conversion. rjson is 2–3× faster than jsonlite for pure parsing because it does less post-processing, which matters for workloads parsing millions of small JSON objects per second. jsonlite supports NDJSON streaming via stream_in()/stream_out() and provides prettify()/minify() utilities — rjson has none of these. RJSONIO is a third option, similar to rjson in philosophy but older and generally slower. For data science workflows involving data.frames and dplyr, jsonlite is the right choice. For high-throughput JSON parsing where you control post-processing, rjson or the jsonify package (Rcpp-based) may be faster.

How do I write JSON to a file in R?

Use jsonlite::write_json() for the simplest interface: library(jsonlite); write_json(df, "output.json", auto_unbox = TRUE, pretty = TRUE). This writes a data.frame or list to a JSON file. Alternatively, use toJSON() to produce a string and then writeLines(): json_str <- toJSON(df, auto_unbox = TRUE, pretty = TRUE); writeLines(json_str, "output.json"). For NDJSON output (one JSON object per line), use stream_out(): con <- file("output.ndjson", open = "w"); stream_out(df, con); close(con). The pretty = TRUE argument adds 2-space indentation and newlines for human-readable output — omit it for compact production output. The digits argument controls number precision: digits = NA uses full R precision (15 significant digits), while digits = 6 is sufficient for most analytics use cases.

How do I handle missing or null values in R JSON?

JSON null maps to R's NA when jsonlite converts JSON arrays to data.frames, and to R's NULL when it returns nested lists. This distinction matters: NA represents a missing value within a vector (preserving the vector's type), while NULL represents the absence of an object. When serializing with toJSON(), R's NA becomes JSON null by default for numeric NAs. To serialize NA as null explicitly, use na = "null" in toJSON(): toJSON(df, na = "null", auto_unbox = TRUE). When reading JSON with missing fields, fromJSON() fills missing values with NA in data.frame output, maintaining rectangular structure across records. For list output, missing fields simply do not appear as list elements — use purrr::pluck(list, "key", .default = NA) for safe access with a fallback value.

Format and validate JSON in R instantly

Paste any JSON from your R script into Jsonic's formatter to check structure and spot errors before parsing.

Open JSON Formatter