JSON Data Modeling: Nested vs References, Polymorphism & Schema Design

Q: Should I embed related data in JSON or use references (IDs)?

The decision between embedding and referencing comes down to three factors: access pattern, update frequency, and data size. Embed when the related data is accessed together with the parent document 80% or more of the time — embedding reduces round trips to zero because both pieces of data arrive in a single read. Embed when the nested data is updated infrequently, because embedding means every update to the nested data requires rewriting the entire parent document. Embed when the nested data set is small and bounded — a user's address, a product's dimension specs, or a list of tags are good candidates. Reference by ID when the related entity is shared across many parent documents (e.g., a category referenced by thousands of products — embedding duplicates the category in every product, creating an update anomaly). Reference when the nested collection can grow without bound — for example, a user's order history could eventually contain thousands of orders; referencing by ID keeps the user document small and stable. Reference when the child entity has an independent lifecycle and is frequently updated on its own. A practical heuristic: embed for "1-to-few" relationships (fewer than about 10 items), reference for "1-to-many" (more than 10–100 items). MongoDB's documentation formalizes this: embed 1-to-few, reference 1-to-many (100+ items), and always reference 1-to-squillions (unbounded counts like log entries). The key trap to avoid is premature denormalization: embedding everything for performance without measuring access patterns leads to large documents with expensive partial updates and update anomalies when the same data is duplicated across many parents.

Q: How do I model a one-to-many relationship in JSON?

There are three main approaches to modeling a one-to-many relationship in JSON, each with distinct trade-offs. The first approach is embedding as an array: store the child objects inline as an array within the parent document, for example {"order": {"id": 1, "items": [{"sku": "A", "qty": 2}, {"sku": "B", "qty": 1}]}}. This eliminates joins, makes the document self-contained, and is ideal when the collection is small (fewer than 100 items), frequently read together with the parent, and rarely updated independently. The second approach is referencing by ID: store only the child IDs in the parent and fetch children separately, for example {"user": {"id": 42, "order_ids": [1001, 1002, 1003]}}. This keeps the parent document small, allows children to have their own lifecycle and update independently, and scales to unlimited children. The trade-off is an extra lookup for each access. The third approach is child-side reference: store the parent ID on each child document instead, for example {"order": {"id": 1001, "user_id": 42}}. This mirrors the relational foreign key pattern, is efficient for fetching all children of a parent, and allows adding children without modifying the parent document at all. Which to choose: if you always load children with the parent and the count is bounded, embed. If children are independently managed or the count is large, use child-side reference (fetch by parent_id) or parent-side ID array depending on whether you need to enumerate the parent's children without querying all children.

Q: How do I model polymorphic types in JSON (objects that can be different shapes)?

Polymorphism in JSON means a field or array element can hold objects of different shapes — for example, a shapes array that contains circles, rectangles, and triangles. The standard technique is the type discriminator pattern: add a "type" field to every object that identifies which variant it is. For example: {"type": "circle", "radius": 5} and {"type": "rect", "width": 10, "height": 5}. The consuming code reads the "type" field first and branches accordingly. This maps directly to TypeScript discriminated unions: type Shape = {type: "circle"; radius: number} | {type: "rect"; width: number; height: number} — TypeScript narrows the type automatically in a switch(shape.type) block, giving full type safety. In JSON Schema, model polymorphic types with oneOf or discriminator: use oneOf with an array of subschemas, or use the OpenAPI discriminator keyword which specifies which field determines the variant. JSON-LD uses the "@type" field for the same purpose in semantic contexts: {"@type": "Product", "name": "Widget"}. The discriminator field is conventionally named "type", "kind", "@type", or "objectType". Whatever name you choose, document it and keep it consistent across all variants. The key design rule: the discriminator field must be present and non-nullable on every object in the polymorphic collection — a missing discriminator makes the consumer's branching logic brittle and creates ambiguity about which shape the object is.

Q: What is the difference between null and a missing field in JSON?

In JSON, null and a missing field have different semantic meanings that matter enormously for API design. A null value means the field is explicitly present but has no value — it is a deliberate signal: "I know about this field and I am telling you its value is empty." A missing field means the sender chose not to include it — it could mean "I do not know the value," "this field does not apply," or "use the default." TypeScript reinforces this distinction: undefined in TypeScript represents a missing field, while null represents an explicit null value. JSON.stringify strips undefined fields from objects entirely — a field set to undefined will not appear in the serialized JSON, making it effectively missing. This asymmetry becomes critical for PATCH API design. JSON Merge Patch (RFC 7396) codifies specific semantics: a field present with a null value means delete this field from the target document; a field that is missing from the patch means leave the target field unchanged. This is the reason you cannot use JSON Merge Patch to set a field to null — null means delete, not assign-null. For APIs that need to distinguish between setting-to-null and leaving-unchanged, use JSON Patch (RFC 6902) instead, which uses explicit operations. Practical guideline: decide at design time whether null and missing are equivalent or distinct for each field in your schema. Document the decision. Inconsistency — where some endpoints treat null and missing as equivalent and others do not — creates subtle bugs in clients that are hard to diagnose.

Q: How do I design JSON for reorderable list items?

A JSON array preserves insertion order, but the array index is fragile as a position signal — inserting an item at position 2 requires renumbering every item after it, which creates conflicts in collaborative or concurrent edit scenarios. There are two main approaches for reorderable lists. The first is using a position field: each item carries an explicit numeric position: [{"id": "a", "position": 1000}, {"id": "b", "position": 2000}, {"id": "c", "position": 3000}]. To insert between a and b, assign position 1500 — no other items need updating. This approach allows sparse numbering (increment by 1000 to leave room) and stable identities (items are referenced by id, not index). The trade-off is that positions must be renormalized periodically when the gaps between items become too small after many insertions. The second approach is a linked-list reference: each item points to its predecessor: {"id": "b", "prev_id": "a"}. Reordering is a pointer update rather than a value change. This is more complex to query but eliminates the need for renormalization. A third lightweight approach is fractional indexing using strings (e.g., "a0", "a1", "b0") which can always be bisected. For most use cases, the position-field approach is the best balance of simplicity and performance. Store items as an array sorted by position for reads; update only the single item whose position changes; use floating-point positions or large integer gaps (1000, 2000, 3000) to minimize renormalization frequency.

Q: How does JSON data modeling differ between MongoDB documents and relational database JSON columns?

MongoDB and relational databases (PostgreSQL jsonb, MySQL JSON) use JSON storage for fundamentally different reasons, and this shapes modeling decisions. In MongoDB, the document is the primary storage unit — the entire object graph for an entity lives in one document, and the query engine is designed to navigate and index nested fields natively. MongoDB recommends embedding sub-documents aggressively for data that is accessed together (the working set fits in one document), with a hard limit of 16MB per document. Indexes in MongoDB can be on nested fields ({"address.city": 1}) and on array elements. Referencing in MongoDB uses DBRef or manual ID fields and requires separate queries — there is no server-side join in the traditional sense (though $lookup in aggregation pipelines provides it). In relational databases, a JSON column stores semi-structured or variable-schema data alongside fully-typed relational columns. The design philosophy is different: use JSON for the fields that vary per row (e.g., a product's custom attributes differ by category), and use regular columns for the fields that are consistent, indexed, and joined. PostgreSQL's jsonb GIN index makes containment queries efficient, but the primary data model is still relational — joins happen between tables, not within a document. The key difference: MongoDB encourages denormalization into documents; relational databases encourage normalization into tables with JSON reserved for genuinely schemaless or semi-structured data. Choose based on your query patterns and the nature of your data's variability.

Q: How do I design JSON models to be extensible without breaking existing clients?

JSON extensibility requires following the robustness principle: be conservative in what you send and liberal in what you accept. For extensibility on the producer side, never remove fields from a response — existing clients depend on them. Deprecate fields by documenting them and adding a replacement field alongside the old one; only remove the old field after all clients have migrated. Add new fields freely — well-written clients should ignore fields they do not recognize (this is the open-world assumption of JSON). For versioning, a version field in the document is the simplest signal: {"version": 2, "data": {...}}. Consumers can switch behavior based on version. For API versioning, URL-based versioning (/v1/, /v2/) or a header (Accept: application/vnd.myapi.v2+json) is cleaner than field-level versioning. For extensibility on the consumer side, always deserialize JSON permissively — use additionalProperties: true in JSON Schema (or omit additionalProperties entirely) to allow unknown fields through without validation errors. In TypeScript, prefer mapped types with an index signature or use z.object({...}).passthrough() in Zod to allow extra fields. A concrete extensibility pattern: use a dedicated metadata or extensions object in your document structure — {"data": {...}, "extensions": {"featureX": {...}}} — so that extension points are clearly separated from core fields and consumers know to check there for optional extra data. Use feature flags in JSON config to enable new behavior progressively: {"features": {"newCheckout": true, "betaDashboard": false}} — clients read the flag and enable the feature, allowing gradual rollout without new API versions.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 20, 2026

JSON data modeling decisions — whether to embed or reference related data, how to handle objects of different shapes, whether to use flat or deeply nested structures, and what null vs a missing field means — have cascading effects on query performance, update complexity, and API compatibility. Unlike relational database design, which has decades of normalization theory, JSON modeling decisions are context-dependent and often misunderstood. This guide covers the core JSON-specific modeling decisions: embedding vs referencing (with a decision matrix), relationship modeling patterns, polymorphic types and discriminated unions, flat vs hierarchical trade-offs, null vs missing field semantics (including JSON Merge Patch implications), array ordering and reorderability, and extensibility strategies. Every pattern includes concrete examples and TypeScript types.

Nested vs Reference-Based JSON Design

The most consequential JSON modeling decision is whether to embed related data directly in the parent document or to store only a reference (an ID) and retrieve the related data separately. Embedding reduces the number of round trips to zero — a single read returns everything. Referencing keeps documents small and avoids duplication but requires additional lookups. Neither is universally better; the right choice depends on three factors: how often the data is accessed together, how frequently the nested data is updated, and the expected size of the nested collection.

`// ── Embedding: nested data lives inside the parent document ──────────
// Good when: accessed together 80%+ of the time, updated infrequently,
//            collection is small and bounded
const orderWithEmbeddedAddress = {
  id: "ord-1001",
  status: "paid",
  amount: 149.99,
  // Address is embedded — read together with order in every display, receipt, etc.
  // Address changes rarely; updating it rewrites the whole order document, which is fine
  shippingAddress: {
    street: "123 Main St",
    city: "Austin",
    state: "TX",
    zip: "78701",
  },
  // Line items: small, bounded, always displayed with the order — ideal for embedding
  items: [
    { sku: "WDG-01", name: "Widget", qty: 2, unitPrice: 49.99 },
    { sku: "GDG-05", name: "Gadget", qty: 1, unitPrice: 49.99 },
  ],
};

// ── Referencing: store only IDs, fetch related data separately ────────
// Good when: related entity is shared, updated independently, or large
const orderWithReferences = {
  id: "ord-1001",
  status: "paid",
  amount: 149.99,
  // Customer is referenced — same customer object is shared across thousands of orders.
  // Embedding would duplicate customer data and create an update anomaly:
  // changing the customer's email would require rewriting every order document.
  customerId: "cust-42",
  // Product catalog is referenced — products update independently (price, description, image)
  // Embedding would duplicate product data across all orders containing that product.
  items: [
    { productId: "WDG-01", qty: 2, unitPrice: 49.99 },
    { productId: "GDG-05", qty: 1, unitPrice: 49.99 },
  ],
};

// ── Decision matrix ───────────────────────────────────────────────────
// Factor                  Embed           Reference
// ─────────────────────────────────────────────────────────────────────
// Access together         80%+ of time    Rarely together
// Update frequency        Low (parent     High (updated
//                         rewrite is ok)  independently)
// Collection size         Small, bounded  Large or unbounded (100+)
// Data sharing            Not shared      Shared across many parents
// Consistency need        Eventual ok     Must be consistent globally
// Read latency            Single read     Extra lookup per access
// Write cost              Full doc        Update in place
//
// Rule of thumb (MongoDB):
//   1-to-few   (~1–10)   → embed
//   1-to-many  (~10–100) → embed with caution or reference
//   1-to-many  (100+)    → reference
//   1-to-squillions      → always reference (child-side parent_id)

// ── Hybrid: embed a snapshot, reference the source ────────────────────
// Common pattern: embed the data you need at the time of the event,
// reference the live entity for current state.
const orderSnapshot = {
  id: "ord-1001",
  // Reference for current customer data
  customerId: "cust-42",
  // Snapshot of product price at time of purchase — embedded intentionally.
  // The live product price may change; the order must record what was charged.
  items: [
    { productId: "WDG-01", priceAtPurchase: 49.99, qty: 2 },
  ],
};

A common mistake is embedding aggressively for performance without measuring access patterns first. Embedding a large nested array means every update to any field in the parent document rewrites the entire array — a single-field update becomes proportionally more expensive as the embedded collection grows. If you find yourself updating the nested data frequently but reading it rarely, that is a strong signal to switch to referencing. See our JSON MongoDB guide for document design patterns specific to MongoDB.

Modeling Relationships in JSON

JSON supports one-to-one, one-to-many, and many-to-many relationships, each with distinct modeling approaches. Unlike relational databases where foreign keys are a single universal mechanism, JSON gives you a choice of embedding or referencing for every relationship, and the right choice differs by relationship type and direction.

`// ── One-to-one: embedding vs reference ───────────────────────────────
// Embed when the related object has no independent identity:
const user = {
  id: "usr-42",
  email: "alice@example.com",
  // Profile has no life outside the user — embed it
  profile: {
    displayName: "Alice",
    bio: "Engineer",
    avatarUrl: "https://cdn.example.com/avatars/42.jpg",
  },
};

// Reference when both entities are first-class and accessed independently:
const employee = {
  id: "emp-10",
  name: "Alice",
  departmentId: "dept-3",  // Department is a first-class entity with its own lifecycle
};

// ── One-to-many: three approaches ─────────────────────────────────────

// Approach 1: Embedded array (parent-contains-children)
// Best for small, bounded collections accessed with the parent
const blogPost = {
  id: "post-55",
  title: "JSON Modeling Guide",
  // Tags: small set, always read with the post, rarely updated independently
  tags: ["json", "modeling", "api"],
  // Comments: can grow unboundedly — better to reference (see approach 3)
};

// Approach 2: Parent-side ID array (parent-lists-child-IDs)
// Best when you need to enumerate children from the parent,
// and children are independently managed
const playlist = {
  id: "pl-99",
  name: "Morning Mix",
  // Track IDs: ordered list of references. Tracks are independent entities.
  // The array preserves order (important for playlists).
  trackIds: ["trk-1", "trk-5", "trk-2"],
};

// Approach 3: Child-side reference (children point to parent)
// Best for large collections and when children are added frequently
// without needing to update the parent document
const comment = {
  id: "cmt-201",
  postId: "post-55",   // Foreign key on the child
  authorId: "usr-42",
  body: "Great article!",
  createdAt: "2026-02-09T10:00:00Z",
};
// Query: GET /posts/post-55/comments → filter comments by postId
// Adding a comment does NOT modify the post document — avoids write contention

// ── Many-to-many: junction arrays ─────────────────────────────────────
// Pattern 1: Symmetric — both sides store the other's IDs
const student = {
  id: "stu-1",
  name: "Bob",
  enrolledCourseIds: ["crs-101", "crs-202"],
};
const course = {
  id: "crs-101",
  title: "Data Structures",
  enrolledStudentIds: ["stu-1", "stu-7", "stu-12"],
};
// Downside: must update both sides atomically — update anomaly risk

// Pattern 2: Junction document (preferred for large many-to-many)
const enrollment = {
  studentId: "stu-1",
  courseId: "crs-101",
  enrolledAt: "2026-01-15T09:00:00Z",
  grade: null,       // Relationship-specific metadata lives on the junction
};
// Query students in a course: filter enrollments by courseId
// Query courses for a student: filter enrollments by studentId
// No update anomaly — a single document represents each relationship

The junction document pattern for many-to-many is underused in JSON API design. It cleanly separates the relationship from the entities, allows relationship-specific metadata (enrollment date, grade, role), and avoids the dual-write update anomaly of the symmetric ID-array pattern. For more on JSON API design conventions, see our JSON API design guide.

Polymorphic JSON Types

Polymorphism in JSON occurs when a field or array element can hold objects of different shapes. The canonical example is a shapes array that contains circles, rectangles, and triangles — all shapes, but each with different fields. Without a clear modeling strategy, polymorphic collections require ad-hoc field inspection and produce fragile consumer code. The standard solution is the type discriminator pattern.

`// ── Type discriminator pattern ────────────────────────────────────────
// Each variant carries a "type" field that identifies its shape.
// The consumer reads "type" first and branches accordingly.

const shapes = [
  { type: "circle",  radius: 5 },
  { type: "rect",    width: 10, height: 5 },
  { type: "polygon", vertices: [[0,0],[5,0],[2.5,4.3]] },
];

// ── TypeScript discriminated union ────────────────────────────────────
// TypeScript narrows the type automatically in switch/if blocks.
type Circle  = { type: "circle";  radius: number };
type Rect    = { type: "rect";    width: number; height: number };
type Polygon = { type: "polygon"; vertices: [number, number][] };
type Shape   = Circle | Rect | Polygon;

function area(shape: Shape): number {
  switch (shape.type) {
    case "circle":  return Math.PI * shape.radius ** 2;
    case "rect":    return shape.width * shape.height;
    case "polygon": return 0; // Shoelace formula omitted for brevity
  }
}
// TypeScript ensures the switch is exhaustive — adding a new variant
// without handling it is a compile error.

// ── JSON Schema: oneOf with discriminator ─────────────────────────────
const shapeSchema = {
  oneOf: [
    {
      type: "object",
      required: ["type", "radius"],
      properties: {
        type:   { type: "string", const: "circle" },
        radius: { type: "number", minimum: 0 },
      },
      additionalProperties: false,
    },
    {
      type: "object",
      required: ["type", "width", "height"],
      properties: {
        type:   { type: "string", const: "rect" },
        width:  { type: "number", minimum: 0 },
        height: { type: "number", minimum: 0 },
      },
      additionalProperties: false,
    },
  ],
};

// OpenAPI 3.1 discriminator: tells validators which subschema to use
// based on the value of a specific field — faster than checking all oneOf branches
const openApiShapeSchema = {
  oneOf: [
    { '$ref': '#/components/schemas/Circle' },
    { '$ref': '#/components/schemas/Rect' },
  ],
  discriminator: {
    propertyName: "type",
    mapping: {
      circle: '#/components/schemas/Circle',
      rect:   '#/components/schemas/Rect',
    },
  },
};

// ── JSON-LD: @type for semantic polymorphism ──────────────────────────
// JSON-LD uses "@type" as the discriminator, linking to a schema vocabulary.
const product = { "@type": "Product", "name": "Widget", "price": 49.99 };
const event   = { "@type": "Event",   "name": "Conference", "startDate": "2026-03-01" };

// ── Envelope pattern: wrap variants in a typed container ──────────────
// Alternative to inline discriminator — useful for heterogeneous event streams
type EventEnvelope =
  | { event: "user.created";  payload: { userId: string; email: string } }
  | { event: "order.placed";  payload: { orderId: string; amount: number } }
  | { event: "item.shipped";  payload: { orderId: string; trackingId: string } };

const webhookEvent: EventEnvelope = {
  event: "order.placed",
  payload: { orderId: "ord-1001", amount: 149.99 },
};
// Consumer: switch(event.event) { case "order.placed": ... }

// ── Anti-pattern: implicit type detection (avoid) ─────────────────────
// Do NOT detect type by checking which fields are present:
function badArea(shape: Record<string, unknown>): number {
  if ("radius" in shape) return Math.PI * (shape.radius as number) ** 2;
  if ("width" in shape)  return (shape.width as number) * (shape.height as number);
  throw new Error("Unknown shape");
}
// Problems: breaks if a new variant has overlapping fields;
// no compile-time safety; hard to document in JSON Schema

The discriminator field is the most important rule of polymorphic JSON design: it must be present on every object in the polymorphic collection, it must be non-nullable, and its values must be stable identifiers (not display strings that might be localized or reformatted). Choose a consistent name — type, kind, or objectType — and use it everywhere in your schema. For JSON Schema oneOf vs anyOf trade-offs, see our oneOf vs anyOf guide.

Flat vs Hierarchical JSON Structures

Deeply nested JSON feels natural to model — an order has a customer, who has an address, which has a city — but it creates practical problems for querying, updating, and diffing. Flat structures are easier to index, patch, and validate; deeply nested structures are more self-documenting but harder to maintain. Understanding the trade-offs helps you choose the right depth for each part of your schema.

`// ── Deeply nested: expressive but hard to update ─────────────────────
const deeplyNested = {
  order: {
    customer: {
      address: {
        shipping: {
          street: "123 Main St",   // Path: order.customer.address.shipping.street
          city: "Austin",
        },
      },
    },
  },
};
// Problems:
// 1. Updating a single field (city) requires constructing the full path
// 2. JSON Pointer for RFC 6902 patch: "/order/customer/address/shipping/city"
//    — verbose and fragile if hierarchy changes
// 3. jsonb GIN index in PostgreSQL must index every nested key
// 4. TypeScript access: order.customer?.address?.shipping?.city
//    — three optional chain operators for one field
// 5. Hard to diff: tools must recurse 5 levels to compare two documents

// ── Flat: easy to update, index, and validate ─────────────────────────
const flat = {
  orderId: "ord-1001",
  customerId: "cust-42",
  shippingStreet: "123 Main St",
  shippingCity: "Austin",
  shippingState: "TX",
  shippingZip: "78701",
};
// Pros: every field is a direct key, easy to index (B-tree on shippingCity),
//       easy to patch (JSON Merge Patch: {"shippingCity": "Dallas"}),
//       TypeScript access: order.shippingCity — no optional chaining
// Cons: field names become long; logical grouping is lost; harder to reuse
//       address structure across shipping/billing

// ── Normalized: group logically related fields into sub-objects ────────
// Sweet spot for most APIs: one level of grouping, then flat within groups
const normalized = {
  id: "ord-1001",
  customerId: "cust-42",
  shipping: {                // One level of nesting for the address group
    street: "123 Main St",   // Within the group, fields are flat
    city: "Austin",
    state: "TX",
    zip: "78701",
  },
  billing: {
    street: "456 Oak Ave",
    city: "Austin",
    state: "TX",
    zip: "78702",
  },
  items: [
    { sku: "WDG-01", qty: 2, unitPrice: 49.99 },
  ],
};
// Access: order.shipping.city — single level, readable
// Patch:  {"shipping": {"city": "Dallas"}} — targets the sub-object
// TypeScript: address object can be reused: type Address = { street: string; city: string; ... }

// ── MongoDB document design: depth recommendations ─────────────────────
// MongoDB can query nested fields natively: db.orders.find({"shipping.city": "Austin"})
// Index on nested field:   db.orders.createIndex({"shipping.city": 1})
// But: queries on very deeply nested paths (4+ levels) become harder to index
// and maintain. MongoDB docs recommend keeping documents under 16MB and
// avoiding unbounded arrays of embedded documents.

// ── Normalization for JSON API responses ──────────────────────────────
// When the same entity (e.g., User) appears in multiple response fields,
// normalize to avoid inconsistency:

// Denormalized (duplicated — update anomaly risk):
const postDenormalized = {
  id: "post-55",
  author: { id: "usr-42", name: "Alice", avatar: "alice.jpg" },
  comments: [
    { id: "cmt-1", author: { id: "usr-42", name: "Alice", avatar: "alice.jpg" }, body: "..." },
    { id: "cmt-2", author: { id: "usr-7",  name: "Bob",   avatar: "bob.jpg"  }, body: "..." },
  ],
};
// Problem: Alice's name appears 2+ times — must update all copies consistently

// Normalized (entities separated, referenced by ID — e.g., Redux / JSON:API style):
const postNormalized = {
  data: {
    id: "post-55",
    authorId: "usr-42",
    commentIds: ["cmt-1", "cmt-2"],
  },
  included: {
    users:    { "usr-42": { name: "Alice", avatar: "alice.jpg" },
                "usr-7":  { name: "Bob",   avatar: "bob.jpg"   } },
    comments: { "cmt-1": { authorId: "usr-42", body: "..." },
                "cmt-2": { authorId: "usr-7",  body: "..." } },
  },
};
// Each user appears once — no duplication, no update anomaly

A practical guideline: limit nesting depth to two or three levels for most schemas. Use a sub-object when the fields form a logical group that might be reused (an address type shared between shipping and billing), but keep the fields within each sub-object flat. Avoid nesting for the sake of mirroring a class hierarchy — JSON consumers rarely benefit from that structure. For flattening deeply nested JSON programmatically, see our JSON flatten guide.

Optional and Nullable Fields Design

The decision of whether an absent value should be represented as null or as a missing field is not a stylistic preference — it has concrete effects on API behavior, JSON Merge Patch semantics, and TypeScript type inference. Getting this wrong leads to subtle bugs in PATCH endpoints, validation logic, and client-side default handling.

`// ── null vs missing field: semantic difference ────────────────────────
// null  = "I know about this field and its value is explicitly empty"
// missing = "I chose not to include this field in this message"

// Example: user profile completion
const completeProfile = {
  userId: "usr-42",
  displayName: "Alice",
  bio: "Engineer at Acme",
  phoneNumber: null,         // Field is present — user explicitly cleared the phone number
};

const incompleteProfile = {
  userId: "usr-42",
  displayName: "Alice",
  // bio is missing — user hasn't filled it in yet (unknown, not explicitly empty)
  // phoneNumber is missing — same reason
};

// ── TypeScript: null vs undefined ─────────────────────────────────────
type UserProfile = {
  userId: string;
  displayName: string;
  bio?: string;              // Optional (? = can be missing from the object)
  phoneNumber: string | null; // Nullable (present but may be null)
};

// bio?: string        → value type is string | undefined
// phoneNumber: string | null → value is always present; null = explicit empty

// ── JSON.stringify strips undefined ───────────────────────────────────
const profile = { displayName: "Alice", bio: undefined, phoneNumber: null };
JSON.stringify(profile);
// '{"displayName":"Alice","phoneNumber":null}'
// bio is gone — JSON has no undefined; undefined fields are silently dropped
// phoneNumber: null is preserved — null is a valid JSON value

// This asymmetry is critical: never use undefined to represent "clear this field"
// in a JSON payload — the field simply disappears.

// ── JSON Merge Patch (RFC 7396) semantics ─────────────────────────────
// PATCH endpoint that uses JSON Merge Patch:
// null value  → delete the field from the target document
// missing key → leave the target field unchanged

// Target document:
const target = { name: "Alice", bio: "Engineer", phone: "555-1234" };

// Merge patch:
const patch = { bio: "Senior Engineer", phone: null };
// Result: { name: "Alice", bio: "Senior Engineer" }
//         name:  unchanged (missing from patch)
//         bio:   updated to "Senior Engineer"
//         phone: DELETED (null means delete in Merge Patch)

// CRITICAL: You cannot use JSON Merge Patch to SET a field to null.
// null always means delete. Use JSON Patch (RFC 6902) for set-to-null:
const jsonPatch = [
  { op: "replace", path: "/phone", value: null }  // Sets phone to null (not delete)
];

// ── Best practices for nullable field design ───────────────────────────
// 1. Decide per field: is null a valid domain value? If yes, make it nullable.
//    If no, omit the field when there is no value.

// 2. Document the distinction: in your API docs, explicitly state whether
//    null and missing are equivalent for each field.

// 3. For PATCH endpoints, choose one semantics and document it clearly:
//    - JSON Merge Patch (RFC 7396): null = delete, missing = unchanged
//    - Custom PATCH: define your own mapping, document explicitly
//    - JSON Patch (RFC 6902): explicit operations, no ambiguity

// 4. Avoid mixing: do not have some fields where null === missing
//    and others where they differ — it creates confusion for API consumers.

// ── JSON Schema: expressing nullability ───────────────────────────────
const fieldSchemas = {
  // Required, non-nullable
  name: { type: "string" },

  // Optional, can be missing (use in JSON Schema "required" to exclude it)
  // Not in "required" array means the key can be absent
  bio: { type: "string" },

  // Nullable: can be present as either string or null
  phoneNumber: { type: ["string", "null"] },

  // Optional AND nullable (can be missing, or present as null or string)
  // Achieved by not including in required AND using type: ["string", "null"]
  middleName: { type: ["string", "null"] },
};

The JSON Merge Patch trap catches many developers: when designing a PATCH API, if you use Merge Patch semantics, you lose the ability to set any field to null — null always means delete. If your domain requires null as a valid field value (e.g., a task with no due date — dueDate: null — vs a task where due date has not been considered — dueDate missing), use JSON Patch (RFC 6902) with explicit operations. See our JSON Merge Patch guide for full PATCH API design patterns.

JSON Arrays: Ordered vs Unordered Collections

JSON arrays are ordered by definition — the specification guarantees that JSON.parse preserves element order. But that guarantee only covers the wire format. When you use a JSON array in your data model, you must decide whether the order is semantically significant (a playlist where the sequence matters) or incidental (a set of tags where order is arbitrary). This distinction shapes how you model, update, and diff the collection.

`// ── Unordered collections: sets ───────────────────────────────────────
// When order does not matter, treat the array as a set.
// Operations: add item, remove item, check membership — no position.
const article = {
  id: "art-55",
  title: "JSON Modeling",
  // Tags are a set — {"json", "api", "design"} in any order is equivalent
  tags: ["json", "api", "design"],
};
// Patching: add a tag → append to array (order does not matter)
// Comparing: sort both arrays before comparing to avoid false diffs
// JSON Schema: use "uniqueItems": true to enforce set semantics
const tagSchema = { type: "array", items: { type: "string" }, uniqueItems: true };

// ── Ordered collections: sequences ────────────────────────────────────
// When order matters, array index carries meaning.
const playlist = {
  id: "pl-99",
  name: "Morning Mix",
  // Tracks are ordered — index 0 plays first. Order is part of the data.
  trackIds: ["trk-3", "trk-1", "trk-5"],
};
// Problem: inserting "trk-7" at position 1 requires a full array replacement.
// Concurrent edits: two clients insert at position 1 simultaneously — conflict.

// ── Reorderable items: position field ────────────────────────────────
// Avoid using array index as the position signal.
// Instead, use an explicit position (rank) field with sparse numbering.
const taskList = {
  id: "tl-10",
  name: "Sprint Tasks",
  tasks: [
    { id: "task-A", title: "Design API",   position: 1000 },
    { id: "task-B", title: "Write tests",  position: 2000 },
    { id: "task-C", title: "Deploy",       position: 3000 },
  ],
};
// Insert between A and B: assign position = 1500
// No other tasks need updating — single document change.
// Renormalize periodically when gaps become too small:
//   find min gap → if < threshold, reassign positions 1000, 2000, 3000, ...

// Read sorted: always sort by position on read
const sorted = taskList.tasks.slice().sort((a, b) => a.position - b.position);

// ── Fractional indexing: string-based positions ───────────────────────
// Alternative for collaborative editors (e.g., Linear, Figma).
// Positions are lexicographically sortable strings:
// "a0" < "a1" < "b0" < "b1" — can always be bisected.
const listItems = [
  { id: "item-1", rank: "a0" },
  { id: "item-2", rank: "a2" },   // Insert item-3 between: rank = "a1"
  { id: "item-4", rank: "b0" },
];
// Advantage: no renormalization needed; infinite bisection.
// Disadvantage: strings grow longer with many insertions between the same pair.

// ── When to use which approach ────────────────────────────────────────
// Set (unordered):       tags, permissions, categories, feature flags
//                        → sort for canonical comparison; uniqueItems in schema
// Sequence (ordered):    steps in a workflow, messages in a thread, fields in a form
//                        → array index is authoritative; replace whole array to reorder
// Reorderable (position): kanban cards, playlist tracks, dashboard widgets
//                        → explicit position field; single-item updates for reordering
// Priority (ranked):     search results, recommendations
//                        → sort descending by score field; never store score as index

// ── Pagination considerations ──────────────────────────────────────────
// Do not use array index as a cursor for paginated ordered collections.
// If an item is inserted before the cursor position, the cursor shifts.
// Use a stable cursor: the last item's ID or position value.
const page = {
  items: [
    { id: "task-A", position: 1000 },
    { id: "task-B", position: 2000 },
  ],
  nextCursor: "2000",  // Cursor is the last position value, not the array index
};

The position-field pattern is the most practical solution for reorderable items in production systems. Use gaps of 1000 between initial positions to allow many insertions before renormalization is needed. When the minimum gap falls below a threshold (e.g., below 1), run a background job to redistribute positions evenly. Never use array index as a cursor in paginated APIs — index-based cursors break whenever items are inserted before the cursor position. For JSON array manipulation methods, see our JSON array methods guide.

Versioning and Extensibility in JSON Models

A JSON model that cannot evolve without breaking existing clients is a liability. Good extensibility design means clients written against version 1 of your schema continue to work correctly when you add new fields in version 2, and deprecation gives clients time to migrate before removal. The key principles: additive changes are non-breaking; removing or renaming fields is breaking; changing the type of a field is breaking.

`// ── Version field: explicit schema version in the document ───────────
// Useful for stored documents, event streams, and file formats
// where the reader must know how to parse the document.
const configV1 = {
  version: 1,
  databaseUrl: "postgres://localhost/mydb",
  maxConnections: 10,
};

const configV2 = {
  version: 2,
  database: {            // Restructured — grouped fields; version signals the change
    url: "postgres://localhost/mydb",
    maxConnections: 10,
    readReplicas: [],    // New field in v2
  },
};

// Consumer handles both versions:
function parseConfig(config: Record<string, unknown>) {
  if (config.version === 2) {
    return config.database;  // New structure
  }
  // Default: v1 structure (backwards compatible read)
  return { url: config.databaseUrl, maxConnections: config.maxConnections };
}

// ── Non-breaking additive changes ────────────────────────────────────
// Adding a new optional field is always safe — existing clients ignore it
const userV1 = { id: "usr-42", email: "alice@example.com" };
const userV2 = { id: "usr-42", email: "alice@example.com", displayName: "Alice" };
// v1 clients: read id and email, ignore displayName — still works
// v2 clients: read all three fields

// ── Breaking changes (require version bump) ───────────────────────────
// 1. Removing a field — clients that read it receive undefined
// 2. Renaming a field — clients that read the old name receive undefined
// 3. Changing a field's type — clients cast to wrong type, runtime error
// 4. Making an optional field required — clients not sending it fail validation
// 5. Narrowing a field's allowed values — clients sending previously-valid values fail

// ── Deprecation pattern: parallel fields ─────────────────────────────
// Add the new field, keep the old field, document the deprecation.
const productV2 = {
  id: "prod-1",
  // Old field — deprecated, will be removed in v3
  // Clients: migrate to use priceInCents
  price: 49.99,
  // New field — higher precision, no floating-point rounding issues
  priceInCents: 4999,
};
// After all clients migrate to priceInCents, remove price in v3.

// ── additionalProperties: open vs closed schema ────────────────────────
// Open schema (additionalProperties: true or omitted) — clients tolerate new fields
const openSchema = {
  type: "object",
  required: ["id", "email"],
  properties: {
    id:    { type: "string" },
    email: { type: "string" },
  },
  // additionalProperties omitted = true by default
  // New fields pass validation without schema changes
};

// Closed schema (additionalProperties: false) — strict; any unknown field fails
// Use only for security-sensitive contexts where you must reject unexpected fields
const closedSchema = {
  type: "object",
  required: ["id", "email"],
  properties: {
    id:    { type: "string" },
    email: { type: "string" },
  },
  additionalProperties: false,   // Any unknown field is a validation error
};

// ── Extension point: reserved namespace ──────────────────────────────
// Reserve a field (e.g., "extensions" or "x-*") for future additions.
// Consumers ignore it; future versions populate it.
const event = {
  type: "order.placed",
  orderId: "ord-1001",
  amount: 149.99,
  extensions: {            // Extension point — consumers ignore unknown extensions
    loyaltyPoints: 150,    // Added in a later version without schema change
    fraudScore: 0.02,
  },
};

// ── Feature flags in JSON config ─────────────────────────────────────
// Use a features object to enable new behavior progressively
// without shipping a new API version.
const appConfig = {
  version: 3,
  features: {
    newCheckoutFlow: true,     // Enabled for all users
    betaDashboard:   false,    // Disabled (shadow deploy — code is there, feature is off)
    aiRecommendations: true,
  },
  // Feature-specific config lives under the feature key
  aiRecommendations: {
    modelVersion: "v2",
    maxResults: 5,
  },
};

// Consumer reads feature flag before activating new code path:
// if (config.features.newCheckoutFlow) { renderNewCheckout() }
// else { renderLegacyCheckout() }

// ── Zod: permissive parsing for extensibility ─────────────────────────
import { z } from "zod";

const UserSchema = z.object({
  id: z.string(),
  email: z.string().email(),
}).passthrough();   // Allow unknown fields to pass through instead of stripping/rejecting
// Without .passthrough(), Zod strips unknown fields by default
// With .passthrough(), extra fields are preserved — forward-compatible parsing

The robustness principle — be conservative in what you send, liberal in what you accept — is the foundation of extensible JSON design. Always parse with additionalProperties: true (or .passthrough() in Zod) to tolerate new fields without validation errors. Use the version field for major structural changes; additive fields alone do not require a version bump. The feature flag pattern allows shipping code for a new behavior without activating it, enabling controlled rollouts and instant rollbacks without API deployments. For JSON Schema versioning in detail, see our JSON Schema versioning guide.

Key Terms

Embedding: A JSON document design pattern where related data is stored directly inside the parent document as a nested object or array, rather than storing a reference and retrieving the related data separately. Embedding eliminates extra read operations — a single document fetch returns all related data in one round trip. It is optimal when the embedded data is accessed together with the parent document 80% or more of the time and is updated infrequently (since updating embedded data requires rewriting the entire parent document). Embedding is appropriate for 1-to-few relationships where the nested collection is small and bounded. The main risk is update anomalies: if the same data is embedded in multiple parent documents, every instance must be updated when the data changes. MongoDB recommends embedding for 1-to-few relationships; PostgreSQL jsonb columns often use embedding for document fields that vary per row while using relational columns for consistent, indexed fields.
Reference (by ID): A JSON document design pattern where a related entity is represented only by its identifier (an ID field) rather than embedding the full entity inline. The consumer must perform a separate lookup to retrieve the related entity. Referencing is appropriate when the related entity is shared across many parent documents (embedding would duplicate it and create update anomalies), when the related entity has its own independent lifecycle and is frequently updated on its own, or when the related collection is large or potentially unbounded. There are two reference directions: parent-side (the parent stores an array of child IDs — useful for enumeration) and child-side (the child stores the parent's ID — a foreign key pattern that scales to large collections without modifying the parent document). Many JSON API designs use a combination: parent-side for small, bounded collections and child-side for large, growing collections.
Polymorphism: A property of a JSON field or collection where the value can be one of several different object shapes, each with its own set of fields. For example, a notifications array might contain email notification objects (with a toAddress field), SMS notification objects (with a phoneNumber field), and push notification objects (with a deviceToken field) — all are notifications, but each has a different structure. Polymorphism in JSON requires a strategy for the consumer to determine which shape each object is, so it can process it correctly. Without a clear polymorphism strategy, consumers resort to ad-hoc field inspection (checking which fields are present) which is fragile and does not compose well with type systems. The standard solution is the type discriminator pattern. JSON Schema models polymorphism with oneOf, anyOf, and the OpenAPI discriminator keyword.
Type Discriminator: A field added to every object in a polymorphic collection whose value identifies which variant (shape) the object is. The discriminator allows the consumer to read one field and immediately know how to parse the rest of the object, without inspecting which fields are present. Common discriminator field names include type, kind, objectType, and @type (in JSON-LD). The discriminator value is typically a string constant that maps to a specific schema: {"type": "circle"} means the object has a radius field; {"type": "rect"} means it has width and height. In TypeScript, type discriminators enable discriminated unions where the compiler narrows the type automatically in switch statements based on the discriminator field's value. The discriminator must be present and non-nullable on every object in the polymorphic collection — a missing discriminator breaks the consumer's branching logic and creates ambiguity.
JSON Merge Patch: A standard (RFC 7396) for describing partial updates to a JSON document. A Merge Patch is a JSON document where each key-value pair describes a change to the target: if a key is present with a non-null value, the corresponding field in the target is set to that value; if a key is present with a null value, the corresponding field is deleted from the target; if a key is absent from the patch, the corresponding field in the target is left unchanged. The critical implication: null always means "delete this field" in a Merge Patch — you cannot use Merge Patch to set a field's value to null. For APIs where null is a meaningful domain value and must be settable, use JSON Patch (RFC 6902) instead, which uses explicit operation objects ({"op": "replace", "path": "/field", "value": null}) and has no ambiguity about null semantics.
Normalized vs Denormalized: In database theory, normalization means organizing data to minimize redundancy — each piece of information appears exactly once. In JSON document design, normalization means storing related entities in separate documents and referencing them by ID, analogous to relational foreign keys. Denormalization means embedding related data directly in the parent document, accepting redundancy in exchange for read performance (fewer round trips). A denormalized JSON document might embed a user's name and avatar inside every comment object, even though the same user data is stored elsewhere. This is faster to read but creates an update anomaly: if the user changes their display name, every embedded copy must be updated consistently. Most JSON API designs are strategically denormalized — they embed data that changes rarely and is always read together, while referencing data that changes frequently or is shared widely. The term is also used in the context of JSON:API and Redux state normalization, where response data is transformed into a flat structure with entities indexed by ID to eliminate duplication in client-side state.
Nullable Field: A field in a JSON document whose value may be null — as opposed to a field that is simply optional (may be missing entirely). A nullable field is always present in the document but can have the value null to represent an explicitly empty state. This is semantically distinct from an absent field, which represents "not provided" or "unknown." In JSON Schema, a nullable field is defined with {"type": ["string", "null"] (allowing both string and null values). In TypeScript, field: string | null is nullable (must be present, can be null) while field?: string is optional (can be missing). The choice between nullable and optional for a given field should be driven by domain semantics: if the field's "empty" state is a meaningful signal that must be distinguished from "not provided," make it nullable. If the absence of the field is sufficient to communicate the empty state, make it optional. Mixing the two conventions within a schema without documentation causes confusion for API consumers.
Discriminated Union: A TypeScript (and functional programming) type construct where a union type uses a common literal field — the discriminant — to distinguish between its members. TypeScript's type narrowing automatically reduces the type to a specific member inside a switch or if block that checks the discriminant. For example: type Shape = {type: "circle"; radius: number} | {type: "rect"; width: number; height: number}. Inside case "circle":, TypeScript knows shape.radius is a valid number; inside case "rect":, TypeScript knows shape.width and shape.height are valid. Accessing shape.radius outside the narrowed branch is a type error. Discriminated unions map directly to the JSON type discriminator pattern — the discriminant field in TypeScript corresponds to the type field in the JSON document. When using Zod for runtime validation, discriminated unions are modeled with z.discriminatedUnion("type", [...]), which validates the discriminant first for better performance and error messages.

FAQ

Should I embed related data in JSON or use references (IDs)?

Embed when the related data is accessed together with the parent document 80% or more of the time, updated infrequently, and the nested collection is small and bounded (fewer than ~10–100 items). Examples of good embedding candidates: a shipping address inside an order document, a product\'s dimension specs, a user\'s profile sub-object. Reference by ID when the related entity is shared across many parent documents (embedding would duplicate it and create update anomalies when the shared data changes), when the related entity has an independent lifecycle and is frequently updated, or when the nested collection is large or potentially unbounded. A practical rule from MongoDB\'s documentation: embed 1-to-few relationships (up to ~10 items), reference 1-to-many (100+ items), and always reference 1-to-squillions (unbounded collections like log entries or activity streams). The hybrid snapshot pattern is also common: embed a snapshot of the data at the time of the event (e.g., the price at time of purchase in an order line item) while maintaining a reference to the live entity (the product document) for current data. This avoids update anomalies while preserving the historical record.

How do I model a one-to-many relationship in JSON?

There are three main approaches. First, embedded array: store the child objects inline in the parent, e.g., {"order": {"items": [...]}}. Best for small, bounded collections that are always read with the parent and rarely updated independently. Second, parent-side ID array: store only the child IDs in the parent, e.g., {"playlist": {"trackIds": ["trk-1", "trk-5"]}}. Useful when you need to enumerate children from the parent and children are independently managed entities. Third, child-side reference: put the parent ID on each child document, e.g., {"comment": {"postId": "post-55", "body": "..."}}. This mirrors the relational foreign key pattern, allows unlimited children without ever modifying the parent document, and avoids write contention on the parent when many children are created concurrently. Use child-side reference for large collections (orders for a user, comments on a post, events in a log) and embedded arrays for small, bounded collections. For ordered one-to-many relationships where the sequence matters, a parent-side ID array preserves order; for unordered large collections, child-side reference scales better.

How do I model polymorphic types in JSON (objects that can be different shapes)?

Use the type discriminator pattern: add a type field to every object in the polymorphic collection whose value identifies which variant the object is. For example: {"type": "circle", "radius": 5} and {"type": "rect", "width": 10, "height": 5} (allowing both types). For JSON Schema, use the oneOf keyword with a discriminator field: {"type": ["string", "null"]} (allowing both string and null values). In TypeScript, field: string | null is nullable (must be present, can be null) while field?: string is optional (can be missing). The choice between nullable and optional for a given field should be driven by domain semantics: if the field's "empty" state is a meaningful signal that must be distinguished from "not provided," make it nullable. If the absence of the field is sufficient to communicate the empty state, make it optional. Mixing the two conventions within a schema without documentation causes confusion for API consumers.

What is a discriminated union?

A TypeScript (and functional programming) type construct where a union type uses a common literal field — the discriminant — to distinguish between its members. TypeScript's type narrowing automatically reduces the type to a specific member inside a switch or if block that checks the discriminant. For example: type Shape = {type: "circle"; radius: number} | {type: "rect"; width: number; height: number}. Inside case "circle":, TypeScript knows shape.radius is a valid number; inside case "rect":, TypeScript knows shape.width and shape.height are valid. Accessing shape.radius outside the narrowed branch is a type error. Discriminated unions map directly to the JSON type discriminator pattern — the discriminant field in TypeScript corresponds to the type field in the JSON document. When using Zod for runtime validation, discriminated unions are modeled with z.discriminatedUnion("type", [...]), which validates the discriminant first for better performance and error messages.

What is the difference between null and a missing field in JSON?

null means the field is explicitly present with an empty value — the sender is deliberately communicating "this field has no value." A missing field means the sender chose not to include it — it could mean unknown, not applicable, or use the default. In TypeScript: null maps to the null type (explicitly empty); missing maps to undefined (absent). JSON has no undefined — JSON.stringify silently drops fields set to undefined. This creates a trap: setting a field to undefined in JavaScript and serializing it makes the field disappear from the JSON output, as if it was never included. The distinction is most critical for PATCH API design using JSON Merge Patch (RFC 7396): in a Merge Patch document, a field present with a null value means delete this field from the target; a missing field means leave the target field unchanged. This means you cannot use JSON Merge Patch to set a field to null — null always triggers deletion. Use JSON Patch (RFC 6902) with explicit replace operations if your domain requires setting fields to null.

How do I design JSON for reorderable list items?

Do not use array index as the position signal — array index is fragile because inserting an item at any position requires renumbering every item after it, and concurrent insertions conflict. Instead, use an explicit position (or rank) field on each item with sparse numeric values: [{"id": "a", "position": 1000}, {"id": "b", "position": 2000}, {"id": "c", "position": 3000}]. When an item is reordered, compute the new position by averaging the positions of its neighbors — inserting between position 1000 and 2000 creates position 1500. This prevents the O(n) renumbering cost of array indices. Use numeric positions that have room for many insertions; 1000, 2000, 3000, etc. provide headroom. For a UI reorder operation, optimistically update the client-side array, compute the new position, and send a single PATCH request with the updated position field.

What is the difference between null and a missing field in JSON?

How do I design JSON for reorderable list items?

Do not use array index as the position signal — array index is fragile because inserting an item at any position requires renumbering every item after it, and concurrent insertions conflict. Instead, use an explicit position (or rank) field on each item with sparse numeric values: [{"id": "a", "position": 1000}, {"id": "b", "position": 2000}, {"id": "c", "position": 3000}]. To insert a new item between a and b, assign position: 1500 — only one document changes, no renumbering. Use gaps of 1000 between initial positions to leave room for many insertions. When gaps become too small (minimum gap falls below 1), run a background renormalization job that redistributes positions evenly. Always sort by position on read: items.slice().sort((a, b) => a.position - b.position). For collaborative editors with high insertion frequency, use fractional indexing with lexicographically sortable strings which can always be bisected without renormalization. Never use array index as a pagination cursor in ordered list APIs — index-based cursors break when items are inserted before the cursor.

How does JSON data modeling differ between MongoDB documents and relational database JSON columns?

In MongoDB, the document is the primary storage unit — the entire object graph for an entity is designed to live in one document, and MongoDB's query engine (including the aggregation pipeline and native indexes on nested fields) is built around this assumption. MongoDB encourages embedding aggressively for data accessed together (the working set), with a hard limit of 16MB per document. Referencing in MongoDB uses manual ID fields and requires separate queries or $lookup in aggregation pipelines (which is less performant than a relational join). The modeling philosophy: denormalize into documents for read performance; accept duplication for the entities that are accessed together. In relational databases (PostgreSQL jsonb, MySQL JSON), a JSON column stores semi-structured or variable-schema data as a complement to fully-typed relational columns. The design philosophy is different: use JSON for the fields that vary per row (e.g., a product's custom attributes differ by category — electronics have voltage specs, clothing has size charts), and use regular typed columns for the fields that are consistent, indexed, and joined. Relationships between entities are handled by relational joins, not by embedding. The key difference: MongoDB encourages using documents as the primary modeling unit with embedding; relational databases use JSON for the genuinely variable parts of a row while keeping consistent fields in proper typed columns.

How do I design JSON models to be extensible without breaking existing clients?

Follow the robustness principle: be conservative in what you send and liberal in what you accept. On the producer side: never remove or rename existing fields (breaking change — clients reading the old field name get undefined); never change a field's type (breaking — clients cast to wrong type); never make an optional field required (breaking — old clients not sending it fail validation). Adding new optional fields is always safe — well-written clients ignore fields they do not recognize. When you must make a breaking change, use a version field or API version URL to signal the schema change and support both versions during a migration window. On the consumer side: parse permissively — use additionalProperties: true in JSON Schema (or .passthrough() in Zod) so that unknown fields from future versions do not cause validation failures. Reserve an extensions object in your document structure as an explicit extension point for future optional data, separated from core fields. Use feature flags in JSON config ({"features": {"newCheckout": true}) to enable new behavior progressively without shipping a new API version — this allows gradual rollouts and instant rollbacks. Document the deprecation timeline for fields you plan to remove: add the replacement field alongside the deprecated field, announce the deprecation, wait for clients to migrate, then remove it.

JSON Data Modeling: Nested vs References, Polymorphism & Schema Design

Nested vs Reference-Based JSON Design

Modeling Relationships in JSON

Polymorphic JSON Types

Flat vs Hierarchical JSON Structures

Optional and Nullable Fields Design

JSON Arrays: Ordered vs Unordered Collections

Versioning and Extensibility in JSON Models

Key Terms

FAQ

Should I embed related data in JSON or use references (IDs)?

How do I model a one-to-many relationship in JSON?

How do I model polymorphic types in JSON (objects that can be different shapes)?

What is a discriminated union?

What is the difference between null and a missing field in JSON?

How do I design JSON for reorderable list items?

What is the difference between null and a missing field in JSON?

How do I design JSON for reorderable list items?

How does JSON data modeling differ between MongoDB documents and relational database JSON columns?

How do I design JSON models to be extensible without breaking existing clients?

Further reading and primary sources