GraphQL JSON Response: Envelope, Custom Scalars & Persisted Queries

Q: What is the structure of a GraphQL JSON response?

A GraphQL JSON response always has a mandatory "data" key containing the query result (or null on catastrophic error), an optional "errors" array when something went wrong, and an optional "extensions" key for metadata like tracing or caching hints. The HTTP status code is almost always 200 — even when errors is populated — because GraphQL errors are application-level, not transport-level. Example: { "data": { "user": { "id": "1", "name": "Alice" } }, "errors": null, "extensions": { "duration": 42 } }. When a field resolver throws, "data" will still be present (possibly with null values at the failed field), and the "errors" array describes each failure with a "message", "locations" array (line/column in the query), and "path" array (which field in the response tree failed). Partial success is the norm — a single bad resolver does not abort the entire query.

Q: Why does GraphQL return HTTP 200 even on errors?

GraphQL treats HTTP as a transport layer, not as an application protocol. The GraphQL spec defines its own error reporting mechanism — the "errors" array in the JSON response — which is richer than HTTP status codes. HTTP 200 means the request was received and processed; it does not mean the query succeeded. This design choice allows partial success: a query for 10 fields where 1 resolver fails returns HTTP 200 with 9 valid fields in "data" and 1 error in "errors". The only exceptions: HTTP 400 for malformed requests (invalid JSON body or invalid GraphQL syntax), HTTP 405 for wrong HTTP method, and HTTP 500 for truly catastrophic server failures before GraphQL execution begins.

Q: How do I handle errors in a GraphQL JSON response?

Always check both "data" and "errors" in the response. Never rely on HTTP status alone. Pattern in TypeScript: const res = await fetch("/graphql", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ query }) }); const json = await res.json(); if (json.errors?.length) { /* handle errors */ } const data = json.data;. For Apollo Client, useQuery returns { data, loading, error } — the "error" object combines all GraphQL errors into an ApolloError. Use errorPolicy: "all" to receive partial data alongside errors. Classify errors by extensions.code: UNAUTHENTICATED redirects to login, BAD_USER_INPUT shows validation messages, INTERNAL_SERVER_ERROR shows a generic error state.

Q: What is the GraphQL JSON scalar and when should I use it?

The GraphQL JSON scalar (from graphql-scalars: npm install graphql-scalars) allows any valid JSON value — object, array, string, number, boolean, or null — as a field value. Use it for genuinely dynamic data where the shape cannot be defined in advance: arbitrary metadata, plugin configuration, CMS content blocks, or user-defined settings. Schema: scalar JSON. Resolver: import { GraphQLJSON } from "graphql-scalars"; const resolvers = { JSON: GraphQLJSON }. The tradeoff is loss of type safety — clients receive opaque JSON they must validate themselves using Zod or JSON Schema. Prefer GraphQLJSONObject (restricts to objects only) over plain GraphQLJSON when the field is always an object. Treat JSON scalar fields as unknown in TypeScript, not any.

Q: How do I reduce the size of GraphQL JSON responses?

Four strategies: (1) Persisted queries — replace the full query string with a 32-byte SHA-256 hash, reducing request JSON by 80–90%. Use Apollo APQ or a build-time persisted document loader. (2) Lean field selection — GraphQL eliminates over-fetching; enforce discipline in client queries to only request fields the UI actually renders. (3) Response compression — enable gzip or Brotli on the server; GraphQL JSON compresses well because field names repeat across list items. (4) @defer and @stream directives — stream large lists incrementally rather than waiting for the full result. Persisted queries combined with CDN GET caching give the best results for public data queries.

Q: How does Apollo Client cache GraphQL JSON responses?

Apollo Client uses a normalized in-memory cache (InMemoryCache) where each object is stored by a cache key derived from its __typename and id (e.g., "User:1"). When two queries return the same User:1 object, the cache deduplicates — the second query reads from cache without a network request. Configure custom key fields: new InMemoryCache({ typePolicies: { Product: { keyFields: ["sku"] } } }). Cache updates after mutations: refetchQueries re-fetches specified queries; cache.modify() updates specific fields in place; cache.writeFragment() writes partial data. Apollo DevTools shows the normalized cache store for debugging.

Q: How do I use DataLoader to prevent N+1 queries in GraphQL?

DataLoader (npm install dataloader) batches multiple load(id) calls within one event-loop tick into a single batch query. Without DataLoader, 100 posts each fetching their author makes 101 DB queries. With DataLoader: const userLoader = new DataLoader(async (ids) => { const users = await db.users.findByIds(ids); return ids.map(id => users.find(u => u.id === id)); }); In the Post resolver: author: (post, _, ctx) => ctx.loaders.user.load(post.authorId). DataLoader collects all 100 load calls and fires one SELECT * FROM users WHERE id IN (...) query. Critical: create a new DataLoader per request (not per server) to prevent cross-request data leakage. The batch function must return values in the same order as input IDs.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 20, 2026

Every GraphQL response is JSON with a mandatory data key (the query result) and an optional errors array — unlike REST, GraphQL always returns HTTP 200 even when errors occur, with error details in the errors array. GraphQL eliminates over-fetching: a REST endpoint returning a 10 KB JSON payload might return only 2 KB when queried through GraphQL by specifying exact fields. The tradeoff is complexity — queries must be specified explicitly rather than accepting the full REST response. This guide covers the GraphQL JSON envelope (data/errors/extensions), custom JSON scalars for arbitrary JSON values, persisted queries for production performance, comparing REST vs GraphQL JSON payload sizes, and DataLoader for N+1 batching. All examples use TypeScript with graphql-js and Apollo Server.

The GraphQL JSON Envelope: data, errors, and extensions

Every GraphQL response follows the same three-key JSON structure: data holds the query result, errors is an optional array of error objects, and extensions is an optional map for metadata. The critical distinction from REST is that errors can be non-empty even when data is also non-empty — this partial success pattern is defined by the spec and is the norm, not an edge case.

// ── Successful GraphQL JSON response ─────────────────────────────
{
  "data": {
    "user": {
      "id": "1",
      "name": "Alice",
      "email": "alice@example.com"
    }
  }
}

// ── Partial success: data + errors co-exist ───────────────────────
// Query asks for user and their posts; posts resolver throws
{
  "data": {
    "user": {
      "id": "1",
      "name": "Alice",
      "posts": null          // null because resolver failed
    }
  },
  "errors": [
    {
      "message": "Failed to fetch posts for user 1",
      "locations": [{ "line": 4, "column": 5 }],
      "path": ["user", "posts"],
      "extensions": {
        "code": "INTERNAL_SERVER_ERROR"
      }
    }
  ]
}

// ── extensions: tracing, caching, custom metadata ─────────────────
{
  "data": { "products": [{ "id": "42", "name": "Widget" }] },
  "extensions": {
    "tracing": {
      "version": 1,
      "startTime": "2026-01-02T10:00:00.000Z",
      "endTime":   "2026-01-02T10:00:00.042Z",
      "duration":  42000000
    },
    "cacheControl": { "maxAge": 300, "scope": "PUBLIC" }
  }
}

// ── TypeScript: typed GraphQL response wrapper ────────────────────
interface GraphQLError {
  message: string;
  locations?: Array<{ line: number; column: number }>;
  path?: Array<string | number>;
  extensions?: Record<string, unknown>;
}

interface GraphQLResponse<T> {
  data?: T;
  errors?: GraphQLError[];
  extensions?: Record<string, unknown>;
}

interface UserQuery {
  user: { id: string; name: string; email: string } | null;
}

async function fetchUser(id: string): Promise<GraphQLResponse<UserQuery>> {
  const res = await fetch("/graphql", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      query: `query GetUser($id: ID!) {
        user(id: $id) { id name email }
      }`,
      variables: { id },
    }),
  });
  return res.json() as Promise<GraphQLResponse<UserQuery>>;
}

The path array on each error object is the most useful debugging field — ["user", "posts"] means the posts field inside user failed, while data.user.name may still be valid. ["products", 2, "price"] means the price field on the third product in the list threw. Combined with locations (line/column in the query document), you can pinpoint failures without server logs. The extensions.code convention — popularized by Apollo Server with values like UNAUTHENTICATED, FORBIDDEN, and NOT_FOUND — lets clients handle error categories programmatically. See also our guide on JSON API design for REST comparison patterns.

Parsing GraphQL JSON Responses in the Client

Correct GraphQL response parsing handles four distinct states: loading, network error (fetch failed entirely), GraphQL errors (resolvers failed — in the errors array), and success. Collapsing these into a single if (error) check loses the distinction between a server that returned data-with-errors versus one that was unreachable. Always read both data and errors in every response.

// ── Vanilla fetch: always check both data and errors ─────────────
async function graphqlFetch<T>(
  query: string,
  variables?: Record<string, unknown>
): Promise<{ data: T | null; errors: GraphQLError[] | null }> {
  let res: Response;
  try {
    res = await fetch("/graphql", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ query, variables }),
    });
  } catch (networkErr) {
    // Network-level failure: server unreachable, CORS, timeout
    throw new Error(`Network error: ${(networkErr as Error).message}`);
  }

  // GraphQL almost always returns 200 — 400 for malformed requests
  if (!res.ok) {
    throw new Error(`HTTP ${res.status}: ${res.statusText}`);
  }

  const json = await res.json() as { data?: T; errors?: GraphQLError[] };
  return { data: json.data ?? null, errors: json.errors ?? null };
}

// ── Usage: handle partial success ─────────────────────────────────
const { data, errors } = await graphqlFetch<UserQuery>(`
  query GetUser($id: ID!) {
    user(id: $id) { id name posts { id title } }
  }
`, { id: "1" });

if (errors) {
  const postErrors = errors.filter(e => e.path?.includes("posts"));
  if (postErrors.length) {
    console.warn("Posts failed to load:", postErrors[0].message);
    // Partial: user data may still be valid — do not blank the whole UI
  }
}
if (data?.user) {
  console.log("User:", data.user.name);  // Alice — even if posts failed
}

// ── Apollo Client: useQuery hook ──────────────────────────────────
import { useQuery, gql } from "@apollo/client";

const GET_USER = gql`
  query GetUser($id: ID!) {
    user(id: $id) { id name email posts { id title } }
  }
`;

function UserProfile({ id }: { id: string }) {
  const { loading, error, data } = useQuery(GET_USER, {
    variables: { id },
    errorPolicy: "all",  // "none" (default) | "ignore" | "all"
    // "all" returns partial data even when errors array is non-empty
  });

  if (loading) return <p>Loading...</p>;
  if (error && !data) return <p>Error: {error.message}</p>;

  return (
    <div>
      <h2>{data?.user?.name}</h2>
      {error && <p className="text-yellow-600">Some data failed to load</p>}
    </div>
  );
}

// ── urql: check for errors in the result ─────────────────────────
import { useQuery } from "urql";
const [result] = useQuery({ query: GET_USER, variables: { id: "1" } });
const { data, fetching, error } = result;
// error.graphQLErrors — array of GraphQL errors from the errors field
// error.networkError  — fetch/network failure

Apollo Client's errorPolicy: "all" is the correct setting for most production applications — it surfaces both partial data and errors simultaneously, letting the UI render what it has while showing error states only for the fields that failed. The default errorPolicy: "none" discards data entirely when any error is present, masking partial success and degrading UX. For TypeScript JSON types, define the GraphQLResponse<T> wrapper and use it consistently — the compiler then enforces checking errors alongside data.

Custom JSON Scalar for Arbitrary JSON Values

GraphQL's type system requires every field to have a defined scalar or object type. When data is genuinely dynamic — plugin configuration, CMS content blocks, user-defined metadata — the custom JSON scalar from graphql-scalars allows any valid JSON value as a field value. This is the sanctioned escape hatch from the type system, not a design smell. Use it when the schema cannot reasonably enumerate all possible shapes.

// ── Install ───────────────────────────────────────────────────────
// npm install graphql-scalars

// ── schema.graphql ────────────────────────────────────────────────
scalar JSON
scalar JSONObject   // JSON restricted to objects (not arrays/primitives)

type SiteConfig {
  id: ID!
  name: String!
  settings: JSON          # any JSON value: object, array, string, number
  themeConfig: JSONObject # must be a JSON object
}

type Query {
  siteConfig(id: ID!): SiteConfig
}

// ── Apollo Server resolver setup ──────────────────────────────────
import { ApolloServer } from "@apollo/server";
import { GraphQLJSON, GraphQLJSONObject } from "graphql-scalars";
import { readFileSync } from "fs";

const typeDefs = readFileSync("schema.graphql", "utf-8");

const resolvers = {
  JSON: GraphQLJSON,
  JSONObject: GraphQLJSONObject,

  Query: {
    siteConfig: async (_: unknown, { id }: { id: string }) => ({
      id,
      name: "My Site",
      settings: {              // any shape — validated only as valid JSON
        darkMode: true,
        plugins: ["search", "analytics"],
        limits: { maxUploadMb: 50 },
      },
      themeConfig: {           // must be an object (not array or primitive)
        primaryColor: "#3b82f6",
        fonts: { heading: "Inter", body: "Georgia" },
      },
    }),
  },
};

// ── GraphQL query and JSON response ──────────────────────────────
// query { siteConfig(id: "1") { id name settings themeConfig } }
//
// Response:
// {
//   "data": {
//     "siteConfig": {
//       "id": "1",
//       "name": "My Site",
//       "settings": {
//         "darkMode": true,
//         "plugins": ["search", "analytics"],
//         "limits": { "maxUploadMb": 50 }
//       },
//       "themeConfig": {
//         "primaryColor": "#3b82f6",
//         "fonts": { "heading": "Inter", "body": "Georgia" }
//       }
//     }
//   }
// }

// ── Client-side: validate JSON scalar fields with Zod ─────────────
import { z } from "zod";

const ThemeConfigSchema = z.object({
  primaryColor: z.string().regex(/^#[0-9a-f]{6}$/i),
  fonts: z.object({ heading: z.string(), body: z.string() }),
});

// Treat the JSON scalar field as unknown — never as any
const rawTheme: unknown = data?.siteConfig?.themeConfig;
const theme = ThemeConfigSchema.safeParse(rawTheme);
if (!theme.success) {
  console.error("Invalid themeConfig shape:", theme.error.issues);
}

The discipline when using a JSON scalar is client-side validation — since the server does not enforce shape, the client must. Use JSON Schema validation or Zod to parse JSON scalar fields immediately after receiving the GraphQL response. Always type JSON scalar fields as unknown in TypeScript, never as any — this forces explicit validation before use. Prefer GraphQLJSONObject over plain GraphQLJSON when the field is always an object, as it rejects invalid inputs (arrays, strings) at the server boundary.

Error Handling in GraphQL JSON Responses

Error handling in GraphQL requires a different mental model than REST: field-level errors do not abort the response. A query touching 10 resolvers where 1 throws still returns 9 fields in data alongside 1 entry in errors. Production error handling must classify errors by type, distinguish partial from total failure, and avoid leaking internal details to clients via formatError.

// ── Apollo Server: custom error classes ──────────────────────────
import { GraphQLError } from "graphql";

class AuthenticationError extends GraphQLError {
  constructor(message: string) {
    super(message, {
      extensions: { code: "UNAUTHENTICATED", http: { status: 401 } },
    });
  }
}

class ForbiddenError extends GraphQLError {
  constructor(message: string) {
    super(message, {
      extensions: { code: "FORBIDDEN", http: { status: 403 } },
    });
  }
}

class UserInputError extends GraphQLError {
  constructor(message: string, field: string) {
    super(message, {
      extensions: { code: "BAD_USER_INPUT", field },
    });
  }
}

// ── Resolver with error handling ──────────────────────────────────
const resolvers = {
  Query: {
    user: async (_: unknown, { id }: { id: string }, ctx: Context) => {
      if (!ctx.user) throw new AuthenticationError("You must be logged in");
      if (!ctx.user.canRead("users")) throw new ForbiddenError("Access denied");
      if (!id.match(/^d+$/)) throw new UserInputError("id must be numeric", "id");

      const user = await db.users.findById(id);
      // Return null for "not found" — do NOT throw (not an error, just absence)
      return user ?? null;
    },
  },
};

// ── Apollo Server: mask internal errors in production ────────────
const server = new ApolloServer({
  typeDefs,
  resolvers,
  formatError: (formattedError) => {
    if (process.env.NODE_ENV === "production") {
      const code = formattedError.extensions?.code as string | undefined;
      const safeMessages: Record<string, string> = {
        UNAUTHENTICATED: "Authentication required",
        FORBIDDEN: "Access denied",
        BAD_USER_INPUT: formattedError.message, // safe to surface
      };
      return {
        ...formattedError,
        message: safeMessages[code ?? ""] ?? "An internal error occurred",
      };
    }
    return formattedError; // full detail in development
  },
});

// ── Client: classify errors from the errors array ─────────────────
function classifyGraphQLErrors(errors: GraphQLError[]) {
  return {
    authErrors:   errors.filter(e => e.extensions?.code === "UNAUTHENTICATED"),
    forbidden:    errors.filter(e => e.extensions?.code === "FORBIDDEN"),
    inputErrors:  errors.filter(e => e.extensions?.code === "BAD_USER_INPUT"),
    serverErrors: errors.filter(e => e.extensions?.code === "INTERNAL_SERVER_ERROR"),
  };
}

// ── Error response shape ──────────────────────────────────────────
// {
//   "data": { "user": null },
//   "errors": [
//     {
//       "message": "Authentication required",
//       "locations": [{ "line": 2, "column": 3 }],
//       "path": ["user"],
//       "extensions": { "code": "UNAUTHENTICATED" }
//     }
//   ]
// }

The formatError callback is the production firewall for error details — always strip stack traces and DB error messages before they reach clients. A common mistake is throwing a raw Error from a resolver and relying on default formatting, which may expose internal details. The extensions.code convention lets clients make programmatic decisions without string-matching on error messages, which break when messages change. Return null from a resolver for not-found cases rather than throwing — null in data.user is the correct representation of "this entity does not exist" and does not populate the errors array.

Persisted Queries: Reducing GraphQL JSON Payload Size

Persisted queries solve two problems at once: they shrink request size (sending a 32-byte hash instead of a 2 KB query string) and they prevent arbitrary query execution on public APIs. The client registers query documents at build time; at runtime it sends only the hash. This cuts GraphQL request JSON payload by 80–90% and enables CDN-cacheable GET requests for public queries.

// ── Automatic Persisted Queries (APQ) — Apollo Client ────────────
// npm install @apollo/client crypto-hash

import { ApolloClient, InMemoryCache, createHttpLink } from "@apollo/client";
import { createPersistedQueryLink } from "@apollo/client/link/persisted-queries";
import { sha256 } from "crypto-hash";

const persistedQueriesLink = createPersistedQueryLink({ sha256 });
const httpLink = createHttpLink({ uri: "/graphql" });

const client = new ApolloClient({
  link: persistedQueriesLink.concat(httpLink),
  cache: new InMemoryCache(),
});

// ── APQ request flow ──────────────────────────────────────────────
// First request (cache miss on server):
// POST /graphql
// { "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "abc123..." } } }
//
// Server: hash unknown — responds with:
// { "errors": [{ "message": "PersistedQueryNotFound" }] }
//
// Client retries with full query:
// POST /graphql
// { "query": "query GetUser($id: ID!) { user(id: $id) { id name } }",
//   "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "abc123..." } } }
//
// Server stores hash -> query mapping, returns data.
//
// All subsequent requests (cache hit):
// POST /graphql
// { "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "abc123..." } } }
// -> Server looks up hash, returns data directly

// ── Payload size comparison ───────────────────────────────────────
// Standard request:
// { "query": "query GetProductList($category: String!, $limit: Int!) { products(category: $category, limit: $limit) { id name price imageUrl inStock } }", "variables": { "category": "electronics", "limit": 20 } }
// ~220 bytes

// Persisted query (hash only):
// { "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "e3b0c442..." } }, "variables": { "category": "electronics", "limit": 20 } }
// ~130 bytes — 41% smaller when variables are large; 80-90% when query is long

// ── Apollo Server: APQ cache setup ───────────────────────────────
import { ApolloServer } from "@apollo/server";

// APQ is enabled by default in Apollo Server 4.
// Provide a Redis cache for distributed/multi-node deployments:
import { BaseRedisCache } from "apollo-server-cache-redis";
import Redis from "ioredis";

const server = new ApolloServer({
  typeDefs,
  resolvers,
  cache: new BaseRedisCache({ client: new Redis() }),
});

// ── GET request with persisted query (CDN cacheable) ─────────────
// GET /graphql?extensions={"persistedQuery":{"version":1,"sha256Hash":"abc..."}}&variables={"id":"1"}
// This GET request can be cached by Cloudflare, Fastly, or any CDN
// for anonymous/public data — turning GraphQL into a static-equivalent

Persisted queries also unlock GET requests — send the hash as a query parameter, and any CDN can cache the response for public data (product listings, content pages, navigation). This is significant: standard GraphQL POST requests are not cacheable at the HTTP layer. Combining APQ with GET requests gives GraphQL the same CDN cacheability as REST GET endpoints. For JSON performance at scale, this combination is the highest-leverage optimization for anonymous user traffic.

REST vs GraphQL JSON: Payload Size Comparison

The core GraphQL value proposition is eliminating over-fetching — requesting exactly the fields the UI needs rather than accepting whatever the REST endpoint returns. The savings are measurable but require deliberate query design. A GraphQL query that selects all 20 fields of a type delivers the same payload as REST — the savings only materialize when clients write lean queries.

// ── REST endpoint: GET /api/users/1 ──────────────────────────────
// Returns ALL fields — client cannot control what it receives:
{
  "id": 1,
  "username": "alice",
  "email": "alice@example.com",
  "firstName": "Alice",
  "lastName": "Smith",
  "avatarUrl": "https://cdn.example.com/avatars/1.jpg",
  "bio": "Software engineer. Coffee enthusiast. 500+ words...",
  "createdAt": "2024-01-15T08:30:00Z",
  "updatedAt": "2026-01-02T14:22:00Z",
  "lastLoginAt": "2026-01-02T14:22:00Z",
  "role": "ADMIN",
  "preferences": { "theme": "dark", "language": "en", "notifications": {} },
  "socialLinks": { "twitter": "@alice", "github": "alice" },
  "stats": { "postsCount": 42, "followersCount": 1200 }
}
// Total: ~1200 bytes

// ── GraphQL: request only what the UI card needs ──────────────────
// query { user(id: "1") { id username avatarUrl } }
//
// Response — 3 fields only:
{
  "data": {
    "user": {
      "id": "1",
      "username": "alice",
      "avatarUrl": "https://cdn.example.com/avatars/1.jpg"
    }
  }
}
// Total: ~120 bytes — 90% reduction

// ── REST under-fetching: N round trips ────────────────────────────
// Display a post with author name requires 2 REST calls:
// 1. GET /api/posts/99    -> { id: 99, title: "...", authorId: 1 }
// 2. GET /api/users/1     -> { id: 1, username: "alice", ... }

// GraphQL: single round trip, exact fields
// query {
//   post(id: "99") {
//     id title body publishedAt
//     author { id username avatarUrl }
//     tags { id name }
//   }
// }

// ── Payload benchmark: list of 20 products ────────────────────────
// REST: GET /api/products?category=electronics&limit=20
// All product fields, 20 items: ~28,000 bytes

// GraphQL — card display (5 fields per item):
// query ProductList($category: String!, $limit: Int!) {
//   products(category: $category, limit: $limit) {
//     id name price imageUrl inStock
//   }
// }
// Response: ~4,200 bytes — 85% smaller

// ── When REST is simpler ──────────────────────────────────────────
// Simple CRUD APIs with one client         -> REST is lower overhead
// File uploads                             -> REST multipart is easier
// Public APIs with diverse external users  -> REST + OpenAPI is more approachable
// Streaming / server-sent events           -> REST SSE is simpler
// Heavy HTTP caching requirements          -> REST GET is natively cacheable

For JSON API designdecisions: choose GraphQL when you have multiple clients with different data needs (web, mobile, third-party integrations) — the query flexibility is most valuable when consumers are heterogeneous. Choose REST when you have one server and one client, simple CRUD semantics, or a public API where broad tooling support matters more than payload efficiency. REST's HTTP-layer caching is a genuine advantage that GraphQL needs APQ + GET requests to match.

DataLoader and N+1 Batching in GraphQL JSON

The N+1 problem is the most common GraphQL performance issue: resolving a list of N items where each item triggers 1 additional DB query produces N+1 total queries. DataLoader solves this by batching all individual load(id) calls within a single event-loop tick into one batch query. Without DataLoader, a query for 100 posts with their authors fires 101 DB queries — DataLoader collapses it to 2.

// ── The N+1 problem — no DataLoader ──────────────────────────────
// query { posts { id title author { id name } } }
//
// posts resolver:  SELECT * FROM posts LIMIT 100          (1 query)
// author resolver (called once per post):
//   SELECT * FROM users WHERE id = 1                      (1 query)
//   SELECT * FROM users WHERE id = 2                      (1 query)
//   ...
// Total: 101 DB queries

// ── DataLoader: batch into one query ──────────────────────────────
// npm install dataloader

import DataLoader from "dataloader";

// Batch function: receives array of IDs, must return same-length, same-order array
async function batchUsers(ids: readonly string[]) {
  const users = await db.users.findByIds([...ids]);
  const userMap = new Map(users.map(u => [u.id, u]));
  // Return values in the SAME ORDER as input ids (required by DataLoader contract)
  return ids.map(id => userMap.get(id) ?? new Error(`User ${id} not found`));
}

// ── Create loaders per request — NEVER per server ─────────────────
interface Context {
  user: AuthUser | null;
  loaders: {
    user: DataLoader<string, User>;
    tags: DataLoader<string, Tag[]>;
  };
}

function createContext(): Context {
  return {
    user: null,
    loaders: {
      user: new DataLoader(batchUsers),          // fresh per request
      tags: new DataLoader(batchTagsByPostId),   // fresh per request
    },
  };
}

// ── Resolver using DataLoader ─────────────────────────────────────
const resolvers = {
  Query: {
    posts: () => db.posts.findAll({ limit: 100 }),  // 1 query
  },
  Post: {
    // Called 100 times — DataLoader batches all calls into 1 query
    author: (post: Post, _: unknown, ctx: Context) =>
      ctx.loaders.user.load(post.authorId),

    tags: (post: Post, _: unknown, ctx: Context) =>
      ctx.loaders.tags.load(post.id),
  },
};

// Result: 101 queries collapsed to 3:
// 1. SELECT * FROM posts LIMIT 100
// 2. SELECT * FROM users WHERE id IN (1, 2, 3, ..., 100)   <- batch!
// 3. SELECT * FROM post_tags WHERE post_id IN (1, ..., 100) <- batch!

// ── Per-request caching: same ID twice = one DB call ─────────────
const user1a = await ctx.loaders.user.load("1");
const user1b = await ctx.loaders.user.load("1");  // cache hit
// user1a === user1b — same reference, no extra query

// ── DataLoader options ────────────────────────────────────────────
const loader = new DataLoader(batchUsers, {
  maxBatchSize: 100,              // limit IDs per batch call
  cache: true,                   // per-request cache (default: true)
  cacheKeyFn: (key) => String(key), // normalize mixed ID types
});

// ── Apollo Server context factory ────────────────────────────────
expressMiddleware(server, {
  context: async ({ req }) => ({
    user: await authenticate(req),
    loaders: {
      user: new DataLoader(batchUsers),  // new instance per request
      tags: new DataLoader(batchTagsByPostId),
    },
  }),
});

The single most important DataLoader rule is creating new instances per request — never share a DataLoader across requests. A shared instance caches results from one user's request and may serve stale or unauthorized data to another, causing data leakage. The batch function contract is strict: the returned array must be exactly the same length as the input IDs, in the same order — return new Error(...) for any ID that was not found, rather than omitting it. DataLoader's per-request cache also handles diamond patterns: if the same author appears in multiple posts in one query, the second load(authorId) returns the cached value from the first, with no additional DB round-trip.

Key Terms

GraphQL envelope: The top-level JSON structure that wraps every GraphQL response. It always contains a data key (the query result, or null on total failure), an optional errors array (populated when any resolver throws), and an optional extensions map for metadata. The envelope is defined by the GraphQL specification and is consistent across all servers and clients — unlike REST, where response shape varies by API. Clients must always inspect both data and errors because partial success (data present alongside errors) is a valid and common state. The HTTP status code is almost always 200 regardless of the errors content.
data field: The primary result key in a GraphQL JSON response. Its shape mirrors the query structure exactly — if the query requests user { id name }, then data will be { "user": { "id": "1", "name": "Alice" } }. When a resolver returns null for a nullable field, data contains null at that path without an error entry — null in data means the resolver intentionally returned nothing (not found). When a resolver throws, the failed field becomes null in data and an entry is added to errors. For non-nullable fields (String!), a resolver failure propagates up to the nearest nullable parent, potentially nulling a larger portion of the response tree.
errors array: The optional array in the GraphQL JSON envelope that contains error objects when one or more resolvers fail. Each error object has a message string, a locations array indicating which line/column of the query caused the error, a path array indicating which field in the response tree failed, and an optional extensions map for metadata (commonly extensions.code for error classification). The presence of errors does not mean data is absent — partial success is the norm. An empty errors array is invalid; the key is either absent or contains at least one error object. Always check json.errors?.length > 0, not just json.errors.
extensions: The optional top-level key in the GraphQL JSON envelope for arbitrary metadata about the response. The GraphQL spec reserves it for tooling — it is intentionally untyped (Record<string, unknown>). Common uses: Apollo Tracing (extensions.tracing) with per-resolver timing, cache control hints (extensions.cacheControl.maxAge), persisted query metadata (extensions.persistedQuery), and custom debug data in development. The extensions field also appears on individual error objects within the errors array — error.extensions.code is the standard place for machine-readable error codes like UNAUTHENTICATED or BAD_USER_INPUT.
custom scalar: A GraphQL type that extends the built-in scalars (String, Int, Float, Boolean, ID) with custom serialization, parsing, and validation logic. The JSON scalar from graphql-scalars accepts any valid JSON value — bypassing GraphQL's type system for that field. Other common custom scalars: DateTime (ISO 8601 strings), URL, EmailAddress, and PositiveInt. Custom scalars require both a schema declaration (scalar JSON) and a resolver implementation that handles serialization (output) and parsing (input from query variables and inline values).
persisted query: A performance optimization where GraphQL query strings are registered server-side by their SHA-256 hash, so clients send only the hash at runtime instead of the full query text. Apollo's Automatic Persisted Queries (APQ) implements a two-phase protocol: the client sends the hash; on a cache miss, the server responds with PersistedQueryNotFound; the client resends with the full query; the server stores the hash-to-query mapping. Subsequent requests need only the hash, reducing request JSON payload by 80–90%. Persisted queries also enable CDN-cacheable GET requests for public queries.
DataLoader: A utility library (npm install dataloader) that batches multiple asynchronous load(key) calls made within a single event-loop tick into one batch request. In GraphQL resolvers, it solves the N+1 problem: instead of each resolver firing an individual DB query, DataLoader collects all keys requested in one tick and executes a single batch query. The batch function receives an array of keys and must return an array of values in exactly the same order. DataLoader also caches results within a single request — loading the same key twice returns the cached value without an extra DB call. Always create a new DataLoader instance per HTTP request to prevent cross-request data leakage.
N+1 problem: A database performance anti-pattern where fetching a list of N items triggers N additional queries for associated data — 1 query for the list plus N for the associations, totalling N+1. In GraphQL, this occurs naturally when a list resolver returns N objects and a field resolver on each makes an individual DB call: 100 posts each fetching their author results in 101 DB queries. DataLoader is the standard solution: it batches all 100 author ID lookups into one SELECT * FROM users WHERE id IN (...) query, reducing total DB queries from 101 to 2.

FAQ

What is the structure of a GraphQL JSON response?

A GraphQL JSON response always has a mandatory data key containing the query result (or null on catastrophic error), an optional errors array when something went wrong, and an optional extensions key for metadata like tracing or caching hints. The HTTP status code is almost always 200 — even when errors is populated — because GraphQL errors are application-level, not transport-level. Example: { "data": { "user": { "id": "1", "name": "Alice" } } }. When a field resolver throws, data is still present (with null at the failed field), and errors describes each failure with a message, locations array (line/column in the query), and path array (which field in the response tree failed). Partial success is the norm — a single bad resolver does not abort the entire query.

Why does GraphQL return HTTP 200 even on errors?

GraphQL treats HTTP as a transport layer, not an application protocol. The GraphQL spec defines its own error reporting in the errors array — richer than HTTP status codes. HTTP 200 means the server received and processed the request; it does not mean the query succeeded. This allows partial success: a query for 10 fields where 1 resolver fails returns HTTP 200 with 9 valid fields in data and 1 error in errors. Exceptions: HTTP 400 for malformed requests (invalid JSON body or invalid GraphQL syntax), HTTP 405 for wrong HTTP method, and HTTP 500 for catastrophic failures before GraphQL execution begins. Clients must always inspect the errors array — treat HTTP 200 as only a transport success signal, never as a query success signal in a GraphQL client.

How do I handle errors in a GraphQL JSON response?

Always check both data and errors — never rely on HTTP status alone. Vanilla fetch: const json = await res.json(); if (json.errors?.length) { /* classify and handle */ } const data = json.data;. For Apollo Client, set errorPolicy: "all" on useQuery to receive partial data alongside errors instead of discarding data when any error is present. Classify errors by extensions.code: UNAUTHENTICATED redirects to login, BAD_USER_INPUT shows form validation errors, INTERNAL_SERVER_ERROR shows a generic error. Use the path array on each error to identify which field failed — ["user", "posts"] means only the posts subfield is broken, and data.user.name may still be valid. Apply formatError on the server to strip internal details before they reach clients.

What is the GraphQL JSON scalar and when should I use it?

The GraphQL JSON scalar (npm install graphql-scalars, import GraphQLJSON) allows any valid JSON value — object, array, string, number, boolean, null — as a field value, bypassing the type system for that field. Use it for genuinely dynamic data shapes: plugin configuration, CMS content blocks, user-defined metadata, or arbitrary key-value stores. Schema: scalar JSON. Resolver: { JSON: GraphQLJSON }. The tradeoff is loss of server-side type enforcement — clients receive opaque JSON they must validate with Zod or JSON Schema. Prefer GraphQLJSONObject (objects only) over plain GraphQLJSON when the field is always an object. Type JSON scalar fields as unknown in TypeScript, never any — this forces explicit validation before use. See our guide on JSON Schema validation for client-side validation patterns.

How do I reduce the size of GraphQL JSON responses?

Four strategies: (1) Persisted queries — replace the full query string with a 32-byte SHA-256 hash, reducing request JSON by 80–90%. Use Apollo APQ or a build-time persisted document loader. (2) Lean field selection — GraphQL eliminates server-side over-fetching; enforce query discipline on the client to only request fields the UI renders. (3) Response compression — enable gzip or Brotli on the server; GraphQL JSON compresses well because field names repeat across list items. (4) @defer and @stream — the GraphQL 2023 spec adds incremental delivery for large lists, reducing time-to-first-byte for slow or large responses. Combine persisted queries with CDN GET caching for public queries to achieve REST-equivalent CDN cacheability. See the JSON performance guide for transport-level compression details.

How does Apollo Client cache GraphQL JSON responses?

Apollo Client uses a normalized in-memory cache (InMemoryCache) where each object is stored by a cache key derived from its __typename and id (e.g., User:1). When two queries return the same User:1, the cache deduplicates — the second query reads from cache without a network request. This requires querying __typename and id on every object. Configure custom key fields: new InMemoryCache({ typePolicies: { Product: { keyFields: ["sku"] } } }). Cache update strategies after mutations: refetchQueries re-fetches specified queries, cache.modify() updates specific fields in place, cache.writeFragment() writes partial data. For objects without a stable ID, Apollo falls back to the parent field path as the cache key. Apollo DevTools shows the full normalized cache store for debugging.

What is the difference between a GraphQL and REST JSON response?

REST returns a fixed JSON structure defined by the endpoint — clients get all fields the server includes regardless of what they need. GraphQL returns exactly the fields requested. REST signals success/failure with HTTP status codes (200, 404, 500); GraphQL always returns HTTP 200 and uses the errors array. REST responses have no standard envelope — the payload is the resource directly. GraphQL responses always have the data/errors/extensions envelope. REST GET requests are cacheable at the HTTP/CDN layer by default; GraphQL is typically POST-only, requiring APQ + GET for CDN caching. For payload size: REST returning 20 fields per product for a list of 20 items (~28 KB) versus a GraphQL query for 5 card fields (~4 KB, 85% smaller). The savings require disciplined query writing. See our JSON API design guide for REST vs GraphQL selection criteria.

How do I use DataLoader to prevent N+1 queries in GraphQL?

Install dataloader (npm install dataloader), then define a batch function that receives an array of IDs and returns an array of values in the same order: const userLoader = new DataLoader(async (ids) => { const users = await db.users.findByIds(ids); return ids.map(id => users.find(u => u.id === id) ?? new Error(`User ${id} not found`)); }). In the Post resolver: author: (post, _, ctx) => ctx.loaders.user.load(post.authorId). DataLoader collects all 100 load(authorId) calls from 100 post resolvers and fires one SELECT * FROM users WHERE id IN (...) — reducing 101 DB queries to 2. Critical rules: (1) always create a new DataLoader per request to prevent cross-request cache leakage; (2) the batch function must return an array of exactly the same length and order as the input IDs; (3) return new Error() for not-found IDs, never undefined. DataLoader also caches within the request — loading the same ID twice makes only one DB call.