JSON Data Modeling: Nested vs References, Polymorphism & Schema Design
Last updated:
JSON data modeling decisions — whether to embed or reference related data, how to handle objects of different shapes, whether to use flat or deeply nested structures, and what null vs a missing field means — have cascading effects on query performance, update complexity, and API compatibility. Unlike relational database design, which has decades of normalization theory, JSON modeling decisions are context-dependent and often misunderstood. This guide covers the core JSON-specific modeling decisions: embedding vs referencing (with a decision matrix), relationship modeling patterns, polymorphic types and discriminated unions, flat vs hierarchical trade-offs, null vs missing field semantics (including JSON Merge Patch implications), array ordering and reorderability, and extensibility strategies. Every pattern includes concrete examples and TypeScript types.
Nested vs Reference-Based JSON Design
The most consequential JSON modeling decision is whether to embed related data directly in the parent document or to store only a reference (an ID) and retrieve the related data separately. Embedding reduces the number of round trips to zero — a single read returns everything. Referencing keeps documents small and avoids duplication but requires additional lookups. Neither is universally better; the right choice depends on three factors: how often the data is accessed together, how frequently the nested data is updated, and the expected size of the nested collection.
`// ── Embedding: nested data lives inside the parent document ──────────
// Good when: accessed together 80%+ of the time, updated infrequently,
// collection is small and bounded
const orderWithEmbeddedAddress = {
id: "ord-1001",
status: "paid",
amount: 149.99,
// Address is embedded — read together with order in every display, receipt, etc.
// Address changes rarely; updating it rewrites the whole order document, which is fine
shippingAddress: {
street: "123 Main St",
city: "Austin",
state: "TX",
zip: "78701",
},
// Line items: small, bounded, always displayed with the order — ideal for embedding
items: [
{ sku: "WDG-01", name: "Widget", qty: 2, unitPrice: 49.99 },
{ sku: "GDG-05", name: "Gadget", qty: 1, unitPrice: 49.99 },
],
};
// ── Referencing: store only IDs, fetch related data separately ────────
// Good when: related entity is shared, updated independently, or large
const orderWithReferences = {
id: "ord-1001",
status: "paid",
amount: 149.99,
// Customer is referenced — same customer object is shared across thousands of orders.
// Embedding would duplicate customer data and create an update anomaly:
// changing the customer's email would require rewriting every order document.
customerId: "cust-42",
// Product catalog is referenced — products update independently (price, description, image)
// Embedding would duplicate product data across all orders containing that product.
items: [
{ productId: "WDG-01", qty: 2, unitPrice: 49.99 },
{ productId: "GDG-05", qty: 1, unitPrice: 49.99 },
],
};
// ── Decision matrix ───────────────────────────────────────────────────
// Factor Embed Reference
// ─────────────────────────────────────────────────────────────────────
// Access together 80%+ of time Rarely together
// Update frequency Low (parent High (updated
// rewrite is ok) independently)
// Collection size Small, bounded Large or unbounded (100+)
// Data sharing Not shared Shared across many parents
// Consistency need Eventual ok Must be consistent globally
// Read latency Single read Extra lookup per access
// Write cost Full doc Update in place
//
// Rule of thumb (MongoDB):
// 1-to-few (~1–10) → embed
// 1-to-many (~10–100) → embed with caution or reference
// 1-to-many (100+) → reference
// 1-to-squillions → always reference (child-side parent_id)
// ── Hybrid: embed a snapshot, reference the source ────────────────────
// Common pattern: embed the data you need at the time of the event,
// reference the live entity for current state.
const orderSnapshot = {
id: "ord-1001",
// Reference for current customer data
customerId: "cust-42",
// Snapshot of product price at time of purchase — embedded intentionally.
// The live product price may change; the order must record what was charged.
items: [
{ productId: "WDG-01", priceAtPurchase: 49.99, qty: 2 },
],
};A common mistake is embedding aggressively for performance without measuring access patterns first. Embedding a large nested array means every update to any field in the parent document rewrites the entire array — a single-field update becomes proportionally more expensive as the embedded collection grows. If you find yourself updating the nested data frequently but reading it rarely, that is a strong signal to switch to referencing. See our JSON MongoDB guide for document design patterns specific to MongoDB.
Modeling Relationships in JSON
JSON supports one-to-one, one-to-many, and many-to-many relationships, each with distinct modeling approaches. Unlike relational databases where foreign keys are a single universal mechanism, JSON gives you a choice of embedding or referencing for every relationship, and the right choice differs by relationship type and direction.
`// ── One-to-one: embedding vs reference ───────────────────────────────
// Embed when the related object has no independent identity:
const user = {
id: "usr-42",
email: "alice@example.com",
// Profile has no life outside the user — embed it
profile: {
displayName: "Alice",
bio: "Engineer",
avatarUrl: "https://cdn.example.com/avatars/42.jpg",
},
};
// Reference when both entities are first-class and accessed independently:
const employee = {
id: "emp-10",
name: "Alice",
departmentId: "dept-3", // Department is a first-class entity with its own lifecycle
};
// ── One-to-many: three approaches ─────────────────────────────────────
// Approach 1: Embedded array (parent-contains-children)
// Best for small, bounded collections accessed with the parent
const blogPost = {
id: "post-55",
title: "JSON Modeling Guide",
// Tags: small set, always read with the post, rarely updated independently
tags: ["json", "modeling", "api"],
// Comments: can grow unboundedly — better to reference (see approach 3)
};
// Approach 2: Parent-side ID array (parent-lists-child-IDs)
// Best when you need to enumerate children from the parent,
// and children are independently managed
const playlist = {
id: "pl-99",
name: "Morning Mix",
// Track IDs: ordered list of references. Tracks are independent entities.
// The array preserves order (important for playlists).
trackIds: ["trk-1", "trk-5", "trk-2"],
};
// Approach 3: Child-side reference (children point to parent)
// Best for large collections and when children are added frequently
// without needing to update the parent document
const comment = {
id: "cmt-201",
postId: "post-55", // Foreign key on the child
authorId: "usr-42",
body: "Great article!",
createdAt: "2026-02-09T10:00:00Z",
};
// Query: GET /posts/post-55/comments → filter comments by postId
// Adding a comment does NOT modify the post document — avoids write contention
// ── Many-to-many: junction arrays ─────────────────────────────────────
// Pattern 1: Symmetric — both sides store the other's IDs
const student = {
id: "stu-1",
name: "Bob",
enrolledCourseIds: ["crs-101", "crs-202"],
};
const course = {
id: "crs-101",
title: "Data Structures",
enrolledStudentIds: ["stu-1", "stu-7", "stu-12"],
};
// Downside: must update both sides atomically — update anomaly risk
// Pattern 2: Junction document (preferred for large many-to-many)
const enrollment = {
studentId: "stu-1",
courseId: "crs-101",
enrolledAt: "2026-01-15T09:00:00Z",
grade: null, // Relationship-specific metadata lives on the junction
};
// Query students in a course: filter enrollments by courseId
// Query courses for a student: filter enrollments by studentId
// No update anomaly — a single document represents each relationshipThe junction document pattern for many-to-many is underused in JSON API design. It cleanly separates the relationship from the entities, allows relationship-specific metadata (enrollment date, grade, role), and avoids the dual-write update anomaly of the symmetric ID-array pattern. For more on JSON API design conventions, see our JSON API design guide.
Polymorphic JSON Types
Polymorphism in JSON occurs when a field or array element can hold objects of different shapes. The canonical example is a shapes array that contains circles, rectangles, and triangles — all shapes, but each with different fields. Without a clear modeling strategy, polymorphic collections require ad-hoc field inspection and produce fragile consumer code. The standard solution is the type discriminator pattern.
`// ── Type discriminator pattern ────────────────────────────────────────
// Each variant carries a "type" field that identifies its shape.
// The consumer reads "type" first and branches accordingly.
const shapes = [
{ type: "circle", radius: 5 },
{ type: "rect", width: 10, height: 5 },
{ type: "polygon", vertices: [[0,0],[5,0],[2.5,4.3]] },
];
// ── TypeScript discriminated union ────────────────────────────────────
// TypeScript narrows the type automatically in switch/if blocks.
type Circle = { type: "circle"; radius: number };
type Rect = { type: "rect"; width: number; height: number };
type Polygon = { type: "polygon"; vertices: [number, number][] };
type Shape = Circle | Rect | Polygon;
function area(shape: Shape): number {
switch (shape.type) {
case "circle": return Math.PI * shape.radius ** 2;
case "rect": return shape.width * shape.height;
case "polygon": return 0; // Shoelace formula omitted for brevity
}
}
// TypeScript ensures the switch is exhaustive — adding a new variant
// without handling it is a compile error.
// ── JSON Schema: oneOf with discriminator ─────────────────────────────
const shapeSchema = {
oneOf: [
{
type: "object",
required: ["type", "radius"],
properties: {
type: { type: "string", const: "circle" },
radius: { type: "number", minimum: 0 },
},
additionalProperties: false,
},
{
type: "object",
required: ["type", "width", "height"],
properties: {
type: { type: "string", const: "rect" },
width: { type: "number", minimum: 0 },
height: { type: "number", minimum: 0 },
},
additionalProperties: false,
},
],
};
// OpenAPI 3.1 discriminator: tells validators which subschema to use
// based on the value of a specific field — faster than checking all oneOf branches
const openApiShapeSchema = {
oneOf: [
{ '$ref': '#/components/schemas/Circle' },
{ '$ref': '#/components/schemas/Rect' },
],
discriminator: {
propertyName: "type",
mapping: {
circle: '#/components/schemas/Circle',
rect: '#/components/schemas/Rect',
},
},
};
// ── JSON-LD: @type for semantic polymorphism ──────────────────────────
// JSON-LD uses "@type" as the discriminator, linking to a schema vocabulary.
const product = { "@type": "Product", "name": "Widget", "price": 49.99 };
const event = { "@type": "Event", "name": "Conference", "startDate": "2026-03-01" };
// ── Envelope pattern: wrap variants in a typed container ──────────────
// Alternative to inline discriminator — useful for heterogeneous event streams
type EventEnvelope =
| { event: "user.created"; payload: { userId: string; email: string } }
| { event: "order.placed"; payload: { orderId: string; amount: number } }
| { event: "item.shipped"; payload: { orderId: string; trackingId: string } };
const webhookEvent: EventEnvelope = {
event: "order.placed",
payload: { orderId: "ord-1001", amount: 149.99 },
};
// Consumer: switch(event.event) { case "order.placed": ... }
// ── Anti-pattern: implicit type detection (avoid) ─────────────────────
// Do NOT detect type by checking which fields are present:
function badArea(shape: Record<string, unknown>): number {
if ("radius" in shape) return Math.PI * (shape.radius as number) ** 2;
if ("width" in shape) return (shape.width as number) * (shape.height as number);
throw new Error("Unknown shape");
}
// Problems: breaks if a new variant has overlapping fields;
// no compile-time safety; hard to document in JSON SchemaThe discriminator field is the most important rule of polymorphic JSON design: it must be present on every object in the polymorphic collection, it must be non-nullable, and its values must be stable identifiers (not display strings that might be localized or reformatted). Choose a consistent name — type, kind, or objectType — and use it everywhere in your schema. For JSON Schema oneOf vs anyOf trade-offs, see our oneOf vs anyOf guide.
Flat vs Hierarchical JSON Structures
Deeply nested JSON feels natural to model — an order has a customer, who has an address, which has a city — but it creates practical problems for querying, updating, and diffing. Flat structures are easier to index, patch, and validate; deeply nested structures are more self-documenting but harder to maintain. Understanding the trade-offs helps you choose the right depth for each part of your schema.
`// ── Deeply nested: expressive but hard to update ─────────────────────
const deeplyNested = {
order: {
customer: {
address: {
shipping: {
street: "123 Main St", // Path: order.customer.address.shipping.street
city: "Austin",
},
},
},
},
};
// Problems:
// 1. Updating a single field (city) requires constructing the full path
// 2. JSON Pointer for RFC 6902 patch: "/order/customer/address/shipping/city"
// — verbose and fragile if hierarchy changes
// 3. jsonb GIN index in PostgreSQL must index every nested key
// 4. TypeScript access: order.customer?.address?.shipping?.city
// — three optional chain operators for one field
// 5. Hard to diff: tools must recurse 5 levels to compare two documents
// ── Flat: easy to update, index, and validate ─────────────────────────
const flat = {
orderId: "ord-1001",
customerId: "cust-42",
shippingStreet: "123 Main St",
shippingCity: "Austin",
shippingState: "TX",
shippingZip: "78701",
};
// Pros: every field is a direct key, easy to index (B-tree on shippingCity),
// easy to patch (JSON Merge Patch: {"shippingCity": "Dallas"}),
// TypeScript access: order.shippingCity — no optional chaining
// Cons: field names become long; logical grouping is lost; harder to reuse
// address structure across shipping/billing
// ── Normalized: group logically related fields into sub-objects ────────
// Sweet spot for most APIs: one level of grouping, then flat within groups
const normalized = {
id: "ord-1001",
customerId: "cust-42",
shipping: { // One level of nesting for the address group
street: "123 Main St", // Within the group, fields are flat
city: "Austin",
state: "TX",
zip: "78701",
},
billing: {
street: "456 Oak Ave",
city: "Austin",
state: "TX",
zip: "78702",
},
items: [
{ sku: "WDG-01", qty: 2, unitPrice: 49.99 },
],
};
// Access: order.shipping.city — single level, readable
// Patch: {"shipping": {"city": "Dallas"}} — targets the sub-object
// TypeScript: address object can be reused: type Address = { street: string; city: string; ... }
// ── MongoDB document design: depth recommendations ─────────────────────
// MongoDB can query nested fields natively: db.orders.find({"shipping.city": "Austin"})
// Index on nested field: db.orders.createIndex({"shipping.city": 1})
// But: queries on very deeply nested paths (4+ levels) become harder to index
// and maintain. MongoDB docs recommend keeping documents under 16MB and
// avoiding unbounded arrays of embedded documents.
// ── Normalization for JSON API responses ──────────────────────────────
// When the same entity (e.g., User) appears in multiple response fields,
// normalize to avoid inconsistency:
// Denormalized (duplicated — update anomaly risk):
const postDenormalized = {
id: "post-55",
author: { id: "usr-42", name: "Alice", avatar: "alice.jpg" },
comments: [
{ id: "cmt-1", author: { id: "usr-42", name: "Alice", avatar: "alice.jpg" }, body: "..." },
{ id: "cmt-2", author: { id: "usr-7", name: "Bob", avatar: "bob.jpg" }, body: "..." },
],
};
// Problem: Alice's name appears 2+ times — must update all copies consistently
// Normalized (entities separated, referenced by ID — e.g., Redux / JSON:API style):
const postNormalized = {
data: {
id: "post-55",
authorId: "usr-42",
commentIds: ["cmt-1", "cmt-2"],
},
included: {
users: { "usr-42": { name: "Alice", avatar: "alice.jpg" },
"usr-7": { name: "Bob", avatar: "bob.jpg" } },
comments: { "cmt-1": { authorId: "usr-42", body: "..." },
"cmt-2": { authorId: "usr-7", body: "..." } },
},
};
// Each user appears once — no duplication, no update anomalyA practical guideline: limit nesting depth to two or three levels for most schemas. Use a sub-object when the fields form a logical group that might be reused (an address type shared between shipping and billing), but keep the fields within each sub-object flat. Avoid nesting for the sake of mirroring a class hierarchy — JSON consumers rarely benefit from that structure. For flattening deeply nested JSON programmatically, see our JSON flatten guide.
Optional and Nullable Fields Design
The decision of whether an absent value should be represented as null or as a missing field is not a stylistic preference — it has concrete effects on API behavior, JSON Merge Patch semantics, and TypeScript type inference. Getting this wrong leads to subtle bugs in PATCH endpoints, validation logic, and client-side default handling.
`// ── null vs missing field: semantic difference ────────────────────────
// null = "I know about this field and its value is explicitly empty"
// missing = "I chose not to include this field in this message"
// Example: user profile completion
const completeProfile = {
userId: "usr-42",
displayName: "Alice",
bio: "Engineer at Acme",
phoneNumber: null, // Field is present — user explicitly cleared the phone number
};
const incompleteProfile = {
userId: "usr-42",
displayName: "Alice",
// bio is missing — user hasn't filled it in yet (unknown, not explicitly empty)
// phoneNumber is missing — same reason
};
// ── TypeScript: null vs undefined ─────────────────────────────────────
type UserProfile = {
userId: string;
displayName: string;
bio?: string; // Optional (? = can be missing from the object)
phoneNumber: string | null; // Nullable (present but may be null)
};
// bio?: string → value type is string | undefined
// phoneNumber: string | null → value is always present; null = explicit empty
// ── JSON.stringify strips undefined ───────────────────────────────────
const profile = { displayName: "Alice", bio: undefined, phoneNumber: null };
JSON.stringify(profile);
// '{"displayName":"Alice","phoneNumber":null}'
// bio is gone — JSON has no undefined; undefined fields are silently dropped
// phoneNumber: null is preserved — null is a valid JSON value
// This asymmetry is critical: never use undefined to represent "clear this field"
// in a JSON payload — the field simply disappears.
// ── JSON Merge Patch (RFC 7396) semantics ─────────────────────────────
// PATCH endpoint that uses JSON Merge Patch:
// null value → delete the field from the target document
// missing key → leave the target field unchanged
// Target document:
const target = { name: "Alice", bio: "Engineer", phone: "555-1234" };
// Merge patch:
const patch = { bio: "Senior Engineer", phone: null };
// Result: { name: "Alice", bio: "Senior Engineer" }
// name: unchanged (missing from patch)
// bio: updated to "Senior Engineer"
// phone: DELETED (null means delete in Merge Patch)
// CRITICAL: You cannot use JSON Merge Patch to SET a field to null.
// null always means delete. Use JSON Patch (RFC 6902) for set-to-null:
const jsonPatch = [
{ op: "replace", path: "/phone", value: null } // Sets phone to null (not delete)
];
// ── Best practices for nullable field design ───────────────────────────
// 1. Decide per field: is null a valid domain value? If yes, make it nullable.
// If no, omit the field when there is no value.
// 2. Document the distinction: in your API docs, explicitly state whether
// null and missing are equivalent for each field.
// 3. For PATCH endpoints, choose one semantics and document it clearly:
// - JSON Merge Patch (RFC 7396): null = delete, missing = unchanged
// - Custom PATCH: define your own mapping, document explicitly
// - JSON Patch (RFC 6902): explicit operations, no ambiguity
// 4. Avoid mixing: do not have some fields where null === missing
// and others where they differ — it creates confusion for API consumers.
// ── JSON Schema: expressing nullability ───────────────────────────────
const fieldSchemas = {
// Required, non-nullable
name: { type: "string" },
// Optional, can be missing (use in JSON Schema "required" to exclude it)
// Not in "required" array means the key can be absent
bio: { type: "string" },
// Nullable: can be present as either string or null
phoneNumber: { type: ["string", "null"] },
// Optional AND nullable (can be missing, or present as null or string)
// Achieved by not including in required AND using type: ["string", "null"]
middleName: { type: ["string", "null"] },
};The JSON Merge Patch trap catches many developers: when designing a PATCH API, if you use Merge Patch semantics, you lose the ability to set any field to null — null always means delete. If your domain requires null as a valid field value (e.g., a task with no due date — dueDate: null — vs a task where due date has not been considered — dueDate missing), use JSON Patch (RFC 6902) with explicit operations. See our JSON Merge Patch guide for full PATCH API design patterns.
JSON Arrays: Ordered vs Unordered Collections
JSON arrays are ordered by definition — the specification guarantees that JSON.parse preserves element order. But that guarantee only covers the wire format. When you use a JSON array in your data model, you must decide whether the order is semantically significant (a playlist where the sequence matters) or incidental (a set of tags where order is arbitrary). This distinction shapes how you model, update, and diff the collection.
`// ── Unordered collections: sets ───────────────────────────────────────
// When order does not matter, treat the array as a set.
// Operations: add item, remove item, check membership — no position.
const article = {
id: "art-55",
title: "JSON Modeling",
// Tags are a set — {"json", "api", "design"} in any order is equivalent
tags: ["json", "api", "design"],
};
// Patching: add a tag → append to array (order does not matter)
// Comparing: sort both arrays before comparing to avoid false diffs
// JSON Schema: use "uniqueItems": true to enforce set semantics
const tagSchema = { type: "array", items: { type: "string" }, uniqueItems: true };
// ── Ordered collections: sequences ────────────────────────────────────
// When order matters, array index carries meaning.
const playlist = {
id: "pl-99",
name: "Morning Mix",
// Tracks are ordered — index 0 plays first. Order is part of the data.
trackIds: ["trk-3", "trk-1", "trk-5"],
};
// Problem: inserting "trk-7" at position 1 requires a full array replacement.
// Concurrent edits: two clients insert at position 1 simultaneously — conflict.
// ── Reorderable items: position field ────────────────────────────────
// Avoid using array index as the position signal.
// Instead, use an explicit position (rank) field with sparse numbering.
const taskList = {
id: "tl-10",
name: "Sprint Tasks",
tasks: [
{ id: "task-A", title: "Design API", position: 1000 },
{ id: "task-B", title: "Write tests", position: 2000 },
{ id: "task-C", title: "Deploy", position: 3000 },
],
};
// Insert between A and B: assign position = 1500
// No other tasks need updating — single document change.
// Renormalize periodically when gaps become too small:
// find min gap → if < threshold, reassign positions 1000, 2000, 3000, ...
// Read sorted: always sort by position on read
const sorted = taskList.tasks.slice().sort((a, b) => a.position - b.position);
// ── Fractional indexing: string-based positions ───────────────────────
// Alternative for collaborative editors (e.g., Linear, Figma).
// Positions are lexicographically sortable strings:
// "a0" < "a1" < "b0" < "b1" — can always be bisected.
const listItems = [
{ id: "item-1", rank: "a0" },
{ id: "item-2", rank: "a2" }, // Insert item-3 between: rank = "a1"
{ id: "item-4", rank: "b0" },
];
// Advantage: no renormalization needed; infinite bisection.
// Disadvantage: strings grow longer with many insertions between the same pair.
// ── When to use which approach ────────────────────────────────────────
// Set (unordered): tags, permissions, categories, feature flags
// → sort for canonical comparison; uniqueItems in schema
// Sequence (ordered): steps in a workflow, messages in a thread, fields in a form
// → array index is authoritative; replace whole array to reorder
// Reorderable (position): kanban cards, playlist tracks, dashboard widgets
// → explicit position field; single-item updates for reordering
// Priority (ranked): search results, recommendations
// → sort descending by score field; never store score as index
// ── Pagination considerations ──────────────────────────────────────────
// Do not use array index as a cursor for paginated ordered collections.
// If an item is inserted before the cursor position, the cursor shifts.
// Use a stable cursor: the last item's ID or position value.
const page = {
items: [
{ id: "task-A", position: 1000 },
{ id: "task-B", position: 2000 },
],
nextCursor: "2000", // Cursor is the last position value, not the array index
};The position-field pattern is the most practical solution for reorderable items in production systems. Use gaps of 1000 between initial positions to allow many insertions before renormalization is needed. When the minimum gap falls below a threshold (e.g., below 1), run a background job to redistribute positions evenly. Never use array index as a cursor in paginated APIs — index-based cursors break whenever items are inserted before the cursor position. For JSON array manipulation methods, see our JSON array methods guide.
Versioning and Extensibility in JSON Models
A JSON model that cannot evolve without breaking existing clients is a liability. Good extensibility design means clients written against version 1 of your schema continue to work correctly when you add new fields in version 2, and deprecation gives clients time to migrate before removal. The key principles: additive changes are non-breaking; removing or renaming fields is breaking; changing the type of a field is breaking.
`// ── Version field: explicit schema version in the document ───────────
// Useful for stored documents, event streams, and file formats
// where the reader must know how to parse the document.
const configV1 = {
version: 1,
databaseUrl: "postgres://localhost/mydb",
maxConnections: 10,
};
const configV2 = {
version: 2,
database: { // Restructured — grouped fields; version signals the change
url: "postgres://localhost/mydb",
maxConnections: 10,
readReplicas: [], // New field in v2
},
};
// Consumer handles both versions:
function parseConfig(config: Record<string, unknown>) {
if (config.version === 2) {
return config.database; // New structure
}
// Default: v1 structure (backwards compatible read)
return { url: config.databaseUrl, maxConnections: config.maxConnections };
}
// ── Non-breaking additive changes ────────────────────────────────────
// Adding a new optional field is always safe — existing clients ignore it
const userV1 = { id: "usr-42", email: "alice@example.com" };
const userV2 = { id: "usr-42", email: "alice@example.com", displayName: "Alice" };
// v1 clients: read id and email, ignore displayName — still works
// v2 clients: read all three fields
// ── Breaking changes (require version bump) ───────────────────────────
// 1. Removing a field — clients that read it receive undefined
// 2. Renaming a field — clients that read the old name receive undefined
// 3. Changing a field's type — clients cast to wrong type, runtime error
// 4. Making an optional field required — clients not sending it fail validation
// 5. Narrowing a field's allowed values — clients sending previously-valid values fail
// ── Deprecation pattern: parallel fields ─────────────────────────────
// Add the new field, keep the old field, document the deprecation.
const productV2 = {
id: "prod-1",
// Old field — deprecated, will be removed in v3
// Clients: migrate to use priceInCents
price: 49.99,
// New field — higher precision, no floating-point rounding issues
priceInCents: 4999,
};
// After all clients migrate to priceInCents, remove price in v3.
// ── additionalProperties: open vs closed schema ────────────────────────
// Open schema (additionalProperties: true or omitted) — clients tolerate new fields
const openSchema = {
type: "object",
required: ["id", "email"],
properties: {
id: { type: "string" },
email: { type: "string" },
},
// additionalProperties omitted = true by default
// New fields pass validation without schema changes
};
// Closed schema (additionalProperties: false) — strict; any unknown field fails
// Use only for security-sensitive contexts where you must reject unexpected fields
const closedSchema = {
type: "object",
required: ["id", "email"],
properties: {
id: { type: "string" },
email: { type: "string" },
},
additionalProperties: false, // Any unknown field is a validation error
};
// ── Extension point: reserved namespace ──────────────────────────────
// Reserve a field (e.g., "extensions" or "x-*") for future additions.
// Consumers ignore it; future versions populate it.
const event = {
type: "order.placed",
orderId: "ord-1001",
amount: 149.99,
extensions: { // Extension point — consumers ignore unknown extensions
loyaltyPoints: 150, // Added in a later version without schema change
fraudScore: 0.02,
},
};
// ── Feature flags in JSON config ─────────────────────────────────────
// Use a features object to enable new behavior progressively
// without shipping a new API version.
const appConfig = {
version: 3,
features: {
newCheckoutFlow: true, // Enabled for all users
betaDashboard: false, // Disabled (shadow deploy — code is there, feature is off)
aiRecommendations: true,
},
// Feature-specific config lives under the feature key
aiRecommendations: {
modelVersion: "v2",
maxResults: 5,
},
};
// Consumer reads feature flag before activating new code path:
// if (config.features.newCheckoutFlow) { renderNewCheckout() }
// else { renderLegacyCheckout() }
// ── Zod: permissive parsing for extensibility ─────────────────────────
import { z } from "zod";
const UserSchema = z.object({
id: z.string(),
email: z.string().email(),
}).passthrough(); // Allow unknown fields to pass through instead of stripping/rejecting
// Without .passthrough(), Zod strips unknown fields by default
// With .passthrough(), extra fields are preserved — forward-compatible parsingThe robustness principle — be conservative in what you send, liberal in what you accept — is the foundation of extensible JSON design. Always parse with additionalProperties: true (or .passthrough() in Zod) to tolerate new fields without validation errors. Use the version field for major structural changes; additive fields alone do not require a version bump. The feature flag pattern allows shipping code for a new behavior without activating it, enabling controlled rollouts and instant rollbacks without API deployments. For JSON Schema versioning in detail, see our JSON Schema versioning guide.
Key Terms
- Embedding
- A JSON document design pattern where related data is stored directly inside the parent document as a nested object or array, rather than storing a reference and retrieving the related data separately. Embedding eliminates extra read operations — a single document fetch returns all related data in one round trip. It is optimal when the embedded data is accessed together with the parent document 80% or more of the time and is updated infrequently (since updating embedded data requires rewriting the entire parent document). Embedding is appropriate for 1-to-few relationships where the nested collection is small and bounded. The main risk is update anomalies: if the same data is embedded in multiple parent documents, every instance must be updated when the data changes. MongoDB recommends embedding for 1-to-few relationships; PostgreSQL jsonb columns often use embedding for document fields that vary per row while using relational columns for consistent, indexed fields.
- Reference (by ID)
- A JSON document design pattern where a related entity is represented only by its identifier (an ID field) rather than embedding the full entity inline. The consumer must perform a separate lookup to retrieve the related entity. Referencing is appropriate when the related entity is shared across many parent documents (embedding would duplicate it and create update anomalies), when the related entity has its own independent lifecycle and is frequently updated on its own, or when the related collection is large or potentially unbounded. There are two reference directions: parent-side (the parent stores an array of child IDs — useful for enumeration) and child-side (the child stores the parent's ID — a foreign key pattern that scales to large collections without modifying the parent document). Many JSON API designs use a combination: parent-side for small, bounded collections and child-side for large, growing collections.
- Polymorphism
- A property of a JSON field or collection where the value can be one of several different object shapes, each with its own set of fields. For example, a
notificationsarray might contain email notification objects (with atoAddressfield), SMS notification objects (with aphoneNumberfield), and push notification objects (with adeviceTokenfield) — all are notifications, but each has a different structure. Polymorphism in JSON requires a strategy for the consumer to determine which shape each object is, so it can process it correctly. Without a clear polymorphism strategy, consumers resort to ad-hoc field inspection (checking which fields are present) which is fragile and does not compose well with type systems. The standard solution is the type discriminator pattern. JSON Schema models polymorphism withoneOf,anyOf, and the OpenAPIdiscriminatorkeyword. - Type Discriminator
- A field added to every object in a polymorphic collection whose value identifies which variant (shape) the object is. The discriminator allows the consumer to read one field and immediately know how to parse the rest of the object, without inspecting which fields are present. Common discriminator field names include
type,kind,objectType, and@type(in JSON-LD). The discriminator value is typically a string constant that maps to a specific schema:{"type": "circle"}means the object has aradiusfield;{"type": "rect"}means it haswidthandheight. In TypeScript, type discriminators enable discriminated unions where the compiler narrows the type automatically inswitchstatements based on the discriminator field's value. The discriminator must be present and non-nullable on every object in the polymorphic collection — a missing discriminator breaks the consumer's branching logic and creates ambiguity. - JSON Merge Patch
- A standard (RFC 7396) for describing partial updates to a JSON document. A Merge Patch is a JSON document where each key-value pair describes a change to the target: if a key is present with a non-null value, the corresponding field in the target is set to that value; if a key is present with a
nullvalue, the corresponding field is deleted from the target; if a key is absent from the patch, the corresponding field in the target is left unchanged. The critical implication:nullalways means "delete this field" in a Merge Patch — you cannot use Merge Patch to set a field's value to null. For APIs where null is a meaningful domain value and must be settable, use JSON Patch (RFC 6902) instead, which uses explicit operation objects ({"op": "replace", "path": "/field", "value": null}) and has no ambiguity about null semantics. - Normalized vs Denormalized
- In database theory, normalization means organizing data to minimize redundancy — each piece of information appears exactly once. In JSON document design, normalization means storing related entities in separate documents and referencing them by ID, analogous to relational foreign keys. Denormalization means embedding related data directly in the parent document, accepting redundancy in exchange for read performance (fewer round trips). A denormalized JSON document might embed a user's name and avatar inside every comment object, even though the same user data is stored elsewhere. This is faster to read but creates an update anomaly: if the user changes their display name, every embedded copy must be updated consistently. Most JSON API designs are strategically denormalized — they embed data that changes rarely and is always read together, while referencing data that changes frequently or is shared widely. The term is also used in the context of JSON:API and Redux state normalization, where response data is transformed into a flat structure with entities indexed by ID to eliminate duplication in client-side state.
- Nullable Field
- A field in a JSON document whose value may be
null— as opposed to a field that is simply optional (may be missing entirely). A nullable field is always present in the document but can have the valuenullto represent an explicitly empty state. This is semantically distinct from an absent field, which represents "not provided" or "unknown." In JSON Schema, a nullable field is defined with{"type": ["string", "null"](allowing both string and null values). In TypeScript,field: string | nullis nullable (must be present, can be null) whilefield?: stringis optional (can be missing). The choice between nullable and optional for a given field should be driven by domain semantics: if the field's "empty" state is a meaningful signal that must be distinguished from "not provided," make it nullable. If the absence of the field is sufficient to communicate the empty state, make it optional. Mixing the two conventions within a schema without documentation causes confusion for API consumers. - Discriminated Union
- A TypeScript (and functional programming) type construct where a union type uses a common literal field — the discriminant — to distinguish between its members. TypeScript's type narrowing automatically reduces the type to a specific member inside a
switchorifblock that checks the discriminant. For example:type Shape = {type: "circle"; radius: number} | {type: "rect"; width: number; height: number}. Insidecase "circle":, TypeScript knowsshape.radiusis a validnumber; insidecase "rect":, TypeScript knowsshape.widthandshape.heightare valid. Accessingshape.radiusoutside the narrowed branch is a type error. Discriminated unions map directly to the JSON type discriminator pattern — the discriminant field in TypeScript corresponds to thetypefield in the JSON document. When using Zod for runtime validation, discriminated unions are modeled withz.discriminatedUnion("type", [...]), which validates the discriminant first for better performance and error messages.
FAQ
Should I embed related data in JSON or use references (IDs)?
Embed when the related data is accessed together with the parent document 80% or more of the time, updated infrequently, and the nested collection is small and bounded (fewer than ~10–100 items). Examples of good embedding candidates: a shipping address inside an order document, a product\'s dimension specs, a user\'s profile sub-object. Reference by ID when the related entity is shared across many parent documents (embedding would duplicate it and create update anomalies when the shared data changes), when the related entity has an independent lifecycle and is frequently updated, or when the nested collection is large or potentially unbounded. A practical rule from MongoDB\'s documentation: embed 1-to-few relationships (up to ~10 items), reference 1-to-many (100+ items), and always reference 1-to-squillions (unbounded collections like log entries or activity streams). The hybrid snapshot pattern is also common: embed a snapshot of the data at the time of the event (e.g., the price at time of purchase in an order line item) while maintaining a reference to the live entity (the product document) for current data. This avoids update anomalies while preserving the historical record.
How do I model a one-to-many relationship in JSON?
There are three main approaches. First, embedded array: store the child objects inline in the parent, e.g., {"order": {"items": [...]}}. Best for small, bounded collections that are always read with the parent and rarely updated independently. Second, parent-side ID array: store only the child IDs in the parent, e.g., {"playlist": {"trackIds": ["trk-1", "trk-5"]}}. Useful when you need to enumerate children from the parent and children are independently managed entities. Third, child-side reference: put the parent ID on each child document, e.g., {"comment": {"postId": "post-55", "body": "..."}}. This mirrors the relational foreign key pattern, allows unlimited children without ever modifying the parent document, and avoids write contention on the parent when many children are created concurrently. Use child-side reference for large collections (orders for a user, comments on a post, events in a log) and embedded arrays for small, bounded collections. For ordered one-to-many relationships where the sequence matters, a parent-side ID array preserves order; for unordered large collections, child-side reference scales better.
How do I model polymorphic types in JSON (objects that can be different shapes)?
Use the type discriminator pattern: add a type field to every object in the polymorphic collection whose value identifies which variant the object is. For example: {"type": "circle", "radius": 5} and {"type": "rect", "width": 10, "height": 5} (allowing both types). For JSON Schema, use the oneOf keyword with a discriminator field: {"type": ["string", "null"]} (allowing both string and null values). In TypeScript, field: string | null is nullable (must be present, can be null) while field?: string is optional (can be missing). The choice between nullable and optional for a given field should be driven by domain semantics: if the field's "empty" state is a meaningful signal that must be distinguished from "not provided," make it nullable. If the absence of the field is sufficient to communicate the empty state, make it optional. Mixing the two conventions within a schema without documentation causes confusion for API consumers.
What is a discriminated union?
A TypeScript (and functional programming) type construct where a union type uses a common literal field — the discriminant — to distinguish between its members. TypeScript's type narrowing automatically reduces the type to a specific member inside a switch or if block that checks the discriminant. For example: type Shape = {type: "circle"; radius: number} | {type: "rect"; width: number; height: number}. Inside case "circle":, TypeScript knows shape.radius is a valid number; inside case "rect":, TypeScript knows shape.width and shape.height are valid. Accessing shape.radius outside the narrowed branch is a type error. Discriminated unions map directly to the JSON type discriminator pattern — the discriminant field in TypeScript corresponds to the type field in the JSON document. When using Zod for runtime validation, discriminated unions are modeled with z.discriminatedUnion("type", [...]), which validates the discriminant first for better performance and error messages.
What is the difference between null and a missing field in JSON?
null means the field is explicitly present with an empty value — the sender is deliberately communicating "this field has no value." A missing field means the sender chose not to include it — it could mean unknown, not applicable, or use the default. In TypeScript: null maps to the null type (explicitly empty); missing maps to undefined (absent). JSON has no undefined — JSON.stringify silently drops fields set to undefined. This creates a trap: setting a field to undefined in JavaScript and serializing it makes the field disappear from the JSON output, as if it was never included. The distinction is most critical for PATCH API design using JSON Merge Patch (RFC 7396): in a Merge Patch document, a field present with a null value means delete this field from the target; a missing field means leave the target field unchanged. This means you cannot use JSON Merge Patch to set a field to null — null always triggers deletion. Use JSON Patch (RFC 6902) with explicit replace operations if your domain requires setting fields to null.
How do I design JSON for reorderable list items?
Do not use array index as the position signal — array index is fragile because inserting an item at any position requires renumbering every item after it, and concurrent insertions conflict. Instead, use an explicit position (or rank) field on each item with sparse numeric values: [{"id": "a", "position": 1000}, {"id": "b", "position": 2000}, {"id": "c", "position": 3000}]. When an item is reordered, compute the new position by averaging the positions of its neighbors — inserting between position 1000 and 2000 creates position 1500. This prevents the O(n) renumbering cost of array indices. Use numeric positions that have room for many insertions; 1000, 2000, 3000, etc. provide headroom. For a UI reorder operation, optimistically update the client-side array, compute the new position, and send a single PATCH request with the updated position field.
What is the difference between null and a missing field in JSON?
null means the field is explicitly present with an empty value — the sender is deliberately communicating "this field has no value." A missing field means the sender chose not to include it — it could mean unknown, not applicable, or use the default. In TypeScript: null maps to the null type (explicitly empty); missing maps to undefined (absent). JSON has no undefined — JSON.stringify silently drops fields set to undefined. This creates a trap: setting a field to undefined in JavaScript and serializing it makes the field disappear from the JSON output, as if it was never included. The distinction is most critical for PATCH API design using JSON Merge Patch (RFC 7396): in a Merge Patch document, a field present with a null value means delete this field from the target; a missing field means leave the target field unchanged. This means you cannot use JSON Merge Patch to set a field to null — null always triggers deletion. Use JSON Patch (RFC 6902) with explicit replace operations if your domain requires setting fields to null.
How do I design JSON for reorderable list items?
Do not use array index as the position signal — array index is fragile because inserting an item at any position requires renumbering every item after it, and concurrent insertions conflict. Instead, use an explicit position (or rank) field on each item with sparse numeric values: [{"id": "a", "position": 1000}, {"id": "b", "position": 2000}, {"id": "c", "position": 3000}]. To insert a new item between a and b, assign position: 1500 — only one document changes, no renumbering. Use gaps of 1000 between initial positions to leave room for many insertions. When gaps become too small (minimum gap falls below 1), run a background renormalization job that redistributes positions evenly. Always sort by position on read: items.slice().sort((a, b) => a.position - b.position). For collaborative editors with high insertion frequency, use fractional indexing with lexicographically sortable strings which can always be bisected without renormalization. Never use array index as a pagination cursor in ordered list APIs — index-based cursors break when items are inserted before the cursor.
How does JSON data modeling differ between MongoDB documents and relational database JSON columns?
In MongoDB, the document is the primary storage unit — the entire object graph for an entity is designed to live in one document, and MongoDB's query engine (including the aggregation pipeline and native indexes on nested fields) is built around this assumption. MongoDB encourages embedding aggressively for data accessed together (the working set), with a hard limit of 16MB per document. Referencing in MongoDB uses manual ID fields and requires separate queries or $lookup in aggregation pipelines (which is less performant than a relational join). The modeling philosophy: denormalize into documents for read performance; accept duplication for the entities that are accessed together. In relational databases (PostgreSQL jsonb, MySQL JSON), a JSON column stores semi-structured or variable-schema data as a complement to fully-typed relational columns. The design philosophy is different: use JSON for the fields that vary per row (e.g., a product's custom attributes differ by category — electronics have voltage specs, clothing has size charts), and use regular typed columns for the fields that are consistent, indexed, and joined. Relationships between entities are handled by relational joins, not by embedding. The key difference: MongoDB encourages using documents as the primary modeling unit with embedding; relational databases use JSON for the genuinely variable parts of a row while keeping consistent fields in proper typed columns.
How do I design JSON models to be extensible without breaking existing clients?
Follow the robustness principle: be conservative in what you send and liberal in what you accept. On the producer side: never remove or rename existing fields (breaking change — clients reading the old field name get undefined); never change a field's type (breaking — clients cast to wrong type); never make an optional field required (breaking — old clients not sending it fail validation). Adding new optional fields is always safe — well-written clients ignore fields they do not recognize. When you must make a breaking change, use a version field or API version URL to signal the schema change and support both versions during a migration window. On the consumer side: parse permissively — use additionalProperties: true in JSON Schema (or .passthrough() in Zod) so that unknown fields from future versions do not cause validation failures. Reserve an extensions object in your document structure as an explicit extension point for future optional data, separated from core fields. Use feature flags in JSON config ({"features": {"newCheckout": true}) to enable new behavior progressively without shipping a new API version — this allows gradual rollouts and instant rollbacks. Document the deprecation timeline for fields you plan to remove: add the replacement field alongside the deprecated field, announce the deprecation, wait for clients to migrate, then remove it.
Further reading and primary sources
- MongoDB: Data Modeling Introduction — MongoDB official guide on embedding vs referencing, 1-to-1, 1-to-many, and many-to-many document design patterns
- RFC 7396: JSON Merge Patch — IETF standard defining JSON Merge Patch semantics — null means delete, missing key means leave unchanged
- RFC 6902: JSON Patch — IETF standard for JSON Patch with explicit operations — add, remove, replace, move, copy, test
- JSON Schema: oneOf and discriminator — JSON Schema documentation on oneOf, anyOf, and composing schemas for polymorphic type validation
- TypeScript Handbook: Narrowing and Discriminated Unions — TypeScript official docs on discriminated unions and type narrowing via switch statements