JSON Schema Migration: Backward Compatibility, Versioning & Runtime Upcasting
Last updated:
JSON schema migration is not about database migrations — it is about evolving the shape of your JSON documents over time without breaking existing producers or consumers. The core question is: which changes are safe, and which require a versioning strategy? Adding an optional field with a default is safe — every existing consumer ignores the new field; every existing producer omits it and the default applies. Removing a field, renaming it, or changing its type is breaking — even if the field was marked optional in the schema, downstream code may depend on its presence. This guide covers the exact rules for breaking vs non-breaking changes, four schema versioning strategies (payload version field, Accept-Version header, URL versioning, schema registry), how to write runtime upcaster functions in TypeScript that transform v1 documents into v2 before your application logic sees them, batch-safe PostgreSQL jsonb migrations using jsonb_set with cursor pagination, schema registry compatibility modes (BACKWARD / FORWARD / FULL), and property-based testing with fast-check to verify migrations against 10,000 generated payloads. Every section includes runnable TypeScript and SQL code.
Breaking vs Non-Breaking JSON Schema Changes
The most important skill in JSON schema evolution is classifying every proposed change before shipping it. Non-breaking changes allow old consumers to keep working with new payloads, and new consumers to keep working with old payloads — simultaneously. Breaking changes require coordinated deployment, versioning, or upcasting to avoid failures.
// ── NON-BREAKING (additive) changes ─────────────────────────────────
// 1. Add an optional field with a default value
// Old schema:
{ "type": "object", "required": ["id", "status"],
"properties": { "id": { "type": "integer" }, "status": { "type": "string" } } }
// New schema — adding optional "priority" field:
{ "type": "object", "required": ["id", "status"],
"properties": {
"id": { "type": "integer" },
"status": { "type": "string" },
"priority": { "type": "string", "default": "normal" } // safe: optional
}
}
// Old producers omit "priority" — consumers apply the default.
// Old consumers receive "priority" — and ignore it (no additionalProperties: false).
// 2. Add a new enum value (when consumers treat enum as open set)
// Old: "status": { "enum": ["pending", "paid"] }
// New: "status": { "enum": ["pending", "paid", "refunded"] } // safe if consumer uses default case
// 3. Relax a constraint
// Old: "amount": { "type": "number", "minimum": 1 }
// New: "amount": { "type": "number", "minimum": 0 } // widened — existing data still valid
// 4. Widen a type
// Old: "count": { "type": "integer" }
// New: "count": { "type": ["integer", "null"] } // safe — superset of old type
// ── BREAKING changes ──────────────────────────────────────────────
// 1. Remove a field consumers depend on
// Old: { "id": 1, "name": "Alice", "legacyCode": "L-001" }
// New schema drops "legacyCode" — any consumer reading it gets undefined.
// 2. Rename a field (= remove old + add new)
// Old: { "userId": 42 }
// New: { "user_id": 42 } // breaking: consumers reading "userId" get undefined
// 3. Change a field type
// Old: { "amount": "149.99" } // string
// New: { "amount": 149.99 } // number — JSON.parse reads correctly but
// // TypeScript typed as string breaks at runtime
// 4. Add a new required field
// Old schema: { "required": ["id", "status"] }
// New schema: { "required": ["id", "status", "region"] }
// Breaking for producers: existing producers do not send "region".
// 5. Tighten a constraint
// Old: "name": { "type": "string" }
// New: "name": { "type": "string", "maxLength": 50 } // existing data may exceed limit
// 6. Set additionalProperties: false on a response schema
// Old response accepted any extra fields — new strict schema rejects
// any new field the server adds in a future release.
// ── Classification checklist ──────────────────────────────────────
// Change type | Breaking?
// Add optional field with default | No
// Add required field | Yes (producers)
// Remove any field | Yes (consumers)
// Rename field | Yes (both)
// Change type (narrow) | Yes
// Change type (widen, e.g. int→number) | No
// Add enum value (open set) | No
// Remove enum value | Yes
// Relax constraint (min, maxLength) | No
// Tighten constraint | Yes
// Add additionalProperties: false | Yes (response schemas)
// Remove additionalProperties: false | NoThe simplest heuristic: if the change is purely additive (the new schema accepts a strict superset of documents the old schema accepts), it is non-breaking. If the new schema rejects any document the old schema accepted — even one edge case — it is breaking for consumers. If the new schema requires a document shape that old producers cannot produce, it is breaking for producers. For JSON Schema validation tooling to check these rules at CI time, see the linked guide.
Schema Versioning Strategies
When a breaking change is unavoidable, you need a versioning strategy that allows old and new clients to coexist during the transition. Four strategies are widely used, each with different operational tradeoffs.
// ── Strategy 1: Version field in the payload ─────────────────────────
// Every document carries its own schema version.
// Pros: self-describing, works for event streams and queues.
// Cons: consumers must branch on version; version field pollutes domain model.
interface OrderV1 {
schemaVersion: 1;
userId: number; // old field name
amount: string; // old type: string
}
interface OrderV2 {
schemaVersion: 2;
user_id: number; // renamed
amount: number; // new type: number
region: string; // new required field
}
type Order = OrderV1 | OrderV2; // TypeScript discriminated union
function processOrder(order: Order) {
const normalized = upcasts(order); // always work with V2 shape internally
// ...
}
// ── Strategy 2: Accept-Version request header ─────────────────────────
// Client declares which schema version it speaks; server responds accordingly.
// Pros: no payload pollution; content negotiation is explicit.
// Cons: only works for HTTP APIs; requires server to maintain multiple response serializers.
// Client sends:
// GET /orders/123
// Accept-Version: 1
// Server response (v1 shape):
// { "userId": 42, "amount": "149.99" }
// Client sends:
// Accept-Version: 2
// Server response (v2 shape):
// { "schemaVersion": 2, "user_id": 42, "amount": 149.99, "region": "us-east-1" }
// Express middleware example:
app.get('/orders/:id', (req, res) => {
const version = parseInt(req.headers['accept-version'] as string) || 2;
const order = getOrder(req.params.id);
res.json(version === 1 ? serializeV1(order) : serializeV2(order));
});
// ── Strategy 3: URL versioning (/v1/, /v2/) ───────────────────────────
// Pros: immediately visible in logs/curl; easy to deprecate via routing rules.
// Cons: code duplication if v1 and v2 diverge significantly; clients must
// update base URLs to migrate.
// /v1/orders → old schema handler
// /v2/orders → new schema handler
// Next.js App Router:
// app/api/v1/orders/route.ts — serves OrderV1 shape
// app/api/v2/orders/route.ts — serves OrderV2 shape
// ── Strategy 4: Schema Registry ──────────────────────────────────────
// A centralized service stores schema versions and enforces compatibility.
// See Section 6 for full coverage of Confluent and AWS Glue registries.
// Producer registers a new schema version — registry checks compatibility:
// curl -X POST http://registry:8081/subjects/orders-value/versions // -H 'Content-Type: application/vnd.schemaregistry.v1+json' // -d '{"schema": "{"type":"object","properties":{...}}"}'
// Registry response (if BACKWARD compatible):
// {"id": 42} <- schema ID embedded in message headers
// Consumer fetches schema by ID to deserialize:
// GET http://registry:8081/schemas/ids/42
// ── Deprecation timeline best practice ───────────────────────────────
// Week 0: Publish v2 schema; v1 still fully supported.
// Week 4: Add Deprecation header to v1 responses.
// Week 12: Log warnings for v1 consumers (track via API key or header).
// Week 24: Sunset v1; return 410 Gone with migration docs URL.For REST APIs with external consumers, URL versioning (/v1/, /v2/) is the most discoverable — versions appear in logs, browser history, and curl commands. For internal microservices and event-driven systems, the payload version field combined with upcasters is simpler to operate and avoids HTTP-specific constructs. For Kafka or SQS-based architectures, a schema registry is the standard approach — it prevents incompatible schemas from reaching the broker at all. For JSON API versioning HTTP patterns in detail, see the linked guide.
Writing Backward-Compatible JSON Schemas
Choosing the right JSON Schema keywords makes the difference between a schema that can evolve gracefully and one that becomes a breaking-change minefield. The most impactful decisions are how you use additionalProperties, how you model optional new fields, and how you structure references.
// ── DANGER: additionalProperties: false on a response schema ─────────
// Any field the server adds in the future will break strict clients.
const orderSchemaStrict = {
type: 'object',
required: ['id', 'status'],
properties: {
id: { type: 'integer' },
status: { type: 'string' },
},
additionalProperties: false, // ← NEVER use this on response/event schemas
};
// ── SAFE: omit additionalProperties (defaults to true) ────────────────
const orderSchemaSafe = {
type: 'object',
required: ['id', 'status'],
properties: {
id: { type: 'integer' },
status: { type: 'string' },
// Future fields added here are non-breaking for existing consumers
},
// No additionalProperties — defaults to true (unknown fields allowed)
};
// ── Adding optional new fields with defaults ──────────────────────────
const orderSchemaV2 = {
type: 'object',
required: ['id', 'status'], // "priority" is NOT required
properties: {
id: { type: 'integer' },
status: { type: 'string', enum: ['pending', 'paid', 'shipped'] },
priority: { type: 'string', enum: ['low', 'normal', 'high'], default: 'normal' },
region: { type: 'string', default: 'us-east-1' },
},
};
// ── Using $ref for shared definitions ────────────────────────────────
// $ref allows you to evolve shared types in one place.
const schemas = {
Address: {
type: 'object',
properties: {
street: { type: 'string' },
city: { type: 'string' },
zip: { type: 'string' },
},
},
Order: {
type: 'object',
required: ['id'],
properties: {
id: { type: 'integer' },
shippingAddress: { $ref: '#/components/schemas/Address' },
},
},
};
// ── oneOf for optional new object shapes ──────────────────────────────
// Use oneOf to allow either the old shape or the new shape during transition.
const paymentSchema = {
oneOf: [
// V1 shape: flat card fields
{
type: 'object',
required: ['cardLast4', 'cardBrand'],
properties: {
cardLast4: { type: 'string' },
cardBrand: { type: 'string' },
},
},
// V2 shape: nested payment object
{
type: 'object',
required: ['payment'],
properties: {
payment: {
type: 'object',
required: ['method', 'last4'],
properties: {
method: { type: 'string' },
last4: { type: 'string' },
},
},
},
},
],
};
// ── Null safety for deprecated fields ────────────────────────────────
// When deprecating a field (but not yet removing it), allow null:
const orderWithDeprecated = {
type: 'object',
properties: {
id: { type: 'integer' },
legacyCode: { type: ['string', 'null'], deprecated: true }, // nullable, being removed
status: { type: 'string' },
},
};
// ── Using default values in Ajv ───────────────────────────────────────
import Ajv from 'ajv';
const ajv = new Ajv({ useDefaults: true }); // applies schema "default" values
const validate = ajv.compile(orderSchemaV2);
const doc = { id: 1, status: 'paid' }; // missing "priority" and "region"
validate(doc);
// After validation: doc = { id: 1, status: 'paid', priority: 'normal', region: 'us-east-1' }
// Ajv fills in defaults — no upcaster needed for simple default additions.The additionalProperties: false rule is the most common source of accidental breaking changes in API schemas. Audit every schema you own and check if it is applied to a response or event shape — if so, remove it or replace it with unevaluatedProperties: false from draft 2019-09, which is more composition-aware. For additionalProperties in depth, including the difference from unevaluatedProperties, see the linked guide.
Runtime Schema Upcasting Functions
An upcaster is a pure function that transforms a document in an old schema version into the current schema version. Deploying upcasters in middleware means the rest of your application always works with the current schema — no branching on version fields in business logic, no duplicated processing code for old and new shapes.
// ── TypeScript discriminated union by schema version ──────────────────
interface OrderV1 {
schemaVersion: 1;
userId: number;
totalAmount: string; // was a string in v1
}
interface OrderV2 {
schemaVersion: 2;
user_id: number; // renamed from userId
amount: number; // renamed and type changed to number
region: string; // new required field
}
interface OrderV3 {
schemaVersion: 3;
user_id: number;
amount: number;
region: string;
priority: 'low' | 'normal' | 'high'; // new optional field with default
}
type AnyOrder = OrderV1 | OrderV2 | OrderV3;
type CurrentOrder = OrderV3;
// ── Individual upcasters ──────────────────────────────────────────────
function upcastV1toV2(v1: OrderV1): OrderV2 {
return {
schemaVersion: 2,
user_id: v1.userId,
amount: parseFloat(v1.totalAmount), // type coercion
region: 'us-east-1', // default for missing required field
};
}
function upcastV2toV3(v2: OrderV2): OrderV3 {
return {
schemaVersion: 3,
user_id: v2.user_id,
amount: v2.amount,
region: v2.region,
priority: 'normal', // default for new optional field
};
}
// ── Upcaster registry pattern ─────────────────────────────────────────
type UpcasterFn = (doc: AnyOrder) => AnyOrder;
const upcasters: Record<number, UpcasterFn> = {
1: upcastV1toV2 as UpcasterFn,
2: upcastV2toV3 as UpcasterFn,
};
const CURRENT_VERSION = 3;
function upcast(doc: AnyOrder): CurrentOrder {
let current: AnyOrder = doc;
while (current.schemaVersion < CURRENT_VERSION) {
const fn = upcasters[current.schemaVersion];
if (!fn) throw new Error(`No upcaster for version ${current.schemaVersion}`);
current = fn(current);
}
return current as CurrentOrder;
}
// ── Usage: always receive CurrentOrder after upcast ───────────────────
function processOrder(raw: AnyOrder): void {
const order: CurrentOrder = upcast(raw);
// From here, all code works only with OrderV3 — no version branching needed.
console.log(order.user_id, order.amount, order.region, order.priority);
}
// ── Middleware: Next.js API route ─────────────────────────────────────
// app/api/orders/route.ts
import { NextRequest, NextResponse } from 'next/server';
export async function POST(req: NextRequest) {
const body = await req.json() as AnyOrder;
const order = upcast(body); // normalize before any business logic
await saveOrder(order);
return NextResponse.json({ ok: true });
}
// ── Middleware: Express ───────────────────────────────────────────────
import { Request, Response, NextFunction } from 'express';
function upcasterMiddleware(req: Request, _res: Response, next: NextFunction) {
if (req.body && typeof req.body.schemaVersion === 'number') {
req.body = upcast(req.body as AnyOrder);
}
next();
}
app.use(express.json(), upcasterMiddleware);
// ── Kafka consumer upcaster ───────────────────────────────────────────
consumer.run({
eachMessage: async ({ message }) => {
const raw: AnyOrder = JSON.parse(message.value!.toString());
const order = upcast(raw); // normalize to current schema
await handleOrder(order);
},
});Upcasters must never be deleted — each one is a permanent bridge between two schema versions. As long as any data or message in the wild carries an old version number, the upcaster chain must remain intact. Store upcasters as first-class functions in a dedicated module (lib/upcasters/orders.ts) and test each one in isolation with unit tests before adding it to the registry. For JSON Schema versioning strategies in OpenAPI and JSON Schema draft context, see the linked guide.
Migrating JSON Data in Databases
When JSON documents are stored in a PostgreSQL jsonb column, schema changes require data migration in addition to code changes. The key constraint is that large-table migrations must never run as a single transaction — they must be batched to avoid long-held locks and write-ahead log bloat.
-- ── Scenario: add a new field with a default value ──────────────────
-- Table: orders (id bigserial, data jsonb)
-- Change: add "region" field, default 'us-east-1', to all existing rows
-- WRONG: single-statement UPDATE locks the table for the duration
UPDATE orders SET data = jsonb_set(data, '{region}', '"us-east-1"');
-- On a 10M-row table this holds an exclusive lock for minutes.
-- CORRECT: batch migration with cursor pagination ─────────────────────
-- Run this in a migration script or a one-off job:
DO $$
DECLARE
batch_size CONSTANT int := 5000;
last_id bigint := 0;
max_id bigint;
BEGIN
SELECT max(id) INTO max_id FROM orders;
WHILE last_id < max_id LOOP
UPDATE orders
SET data = jsonb_set(data, '{region}', '"us-east-1"', true)
WHERE id > last_id
AND id <= last_id + batch_size
AND data->>'region' IS NULL; -- idempotent: skip already-migrated rows
last_id := last_id + batch_size;
COMMIT; -- release lock between batches
PERFORM pg_sleep(0.01); -- optional: yield to other transactions
END LOOP;
END $$;
-- ── Complex transformation: rename a field ────────────────────────────
-- Change: rename "userId" to "user_id" in all documents
DO $$
DECLARE
batch_size CONSTANT int := 5000;
last_id bigint := 0;
max_id bigint;
BEGIN
SELECT max(id) INTO max_id FROM orders;
WHILE last_id < max_id LOOP
UPDATE orders
SET data = (data - 'userId') || jsonb_build_object('user_id', data->'userId')
WHERE id > last_id
AND id <= last_id + batch_size
AND data ? 'userId'; -- only rows that have the old field
last_id := last_id + batch_size;
COMMIT;
END LOOP;
END $$;
-- ── Change a field type: string amount to numeric ──────────────────────
-- Old: { "amount": "149.99" }
-- New: { "amount": 149.99 }
DO $$
DECLARE
batch_size CONSTANT int := 2000; -- smaller batches for type coercion
last_id bigint := 0;
max_id bigint;
BEGIN
SELECT max(id) INTO max_id FROM orders;
WHILE last_id < max_id LOOP
UPDATE orders
SET data = jsonb_set(
data,
'{amount}',
to_jsonb((data->>'amount')::numeric)
)
WHERE id > last_id
AND id <= last_id + batch_size
AND jsonb_typeof(data->'amount') = 'string'; -- only string amounts
last_id := last_id + batch_size;
COMMIT;
END LOOP;
END $$;
-- ── Rollback strategy: dual-write with shadow field ───────────────────
-- Instead of mutating in place, keep the old field alongside the new one
-- during the migration window. Both old and new consumers can read.
UPDATE orders
SET data = jsonb_set(
jsonb_set(data, '{user_id}', data->'userId'), -- add new field
'{_userId_deprecated}', data->'userId' -- keep a backup copy
)
WHERE data ? 'userId' AND NOT data ? 'user_id';
-- After all consumers have migrated to user_id:
-- Remove the old and backup fields:
UPDATE orders
SET data = data - 'userId' - '_userId_deprecated'
WHERE data ? 'userId';
-- ── Monitor migration progress ────────────────────────────────────────
SELECT
count(*) FILTER (WHERE data ? 'user_id') AS migrated,
count(*) FILTER (WHERE data ? 'userId') AS pending,
count(*) AS total
FROM orders;Always make data migrations idempotent — add a WHERE clause that skips rows already migrated (AND data->'newField' IS NULL or AND NOT data ? 'newField'). This allows the migration to be re-run safely if it is interrupted mid-way. For PostgreSQL jsonb operators and functions used in these migrations, see the linked guide.
Schema Registry for JSON Schemas
A schema registry centralizes schema storage, assigns unique IDs to each registered schema version, and enforces compatibility rules before a new schema can be published. This prevents incompatible schemas from reaching message brokers or API gateways.
# ── Confluent Schema Registry: register a JSON Schema ────────────────
# Subject naming: {topic}-value for message values, {topic}-key for keys.
# Register the first schema version (subject "orders-value"):
curl -X POST http://localhost:8081/subjects/orders-value/versions -H 'Content-Type: application/vnd.schemaregistry.v1+json' -d '{
"schemaType": "JSON",
"schema": "{"type":"object","required":["id","status"],"properties":{"id":{"type":"integer"},"status":{"type":"string"}}}"
}'
# Response: {"id":1}
# ── Set compatibility mode for a subject ──────────────────────────────
# BACKWARD: new schema can read data written with old schema (safest for consumers)
# FORWARD: old schema can read data written with new schema (safest for producers)
# FULL: both BACKWARD and FORWARD (most restrictive, recommended for events)
# BACKWARD_TRANSITIVE: BACKWARD against ALL previous versions (not just latest)
# NONE: no compatibility check (use only in development)
curl -X PUT http://localhost:8081/config/orders-value -H 'Content-Type: application/vnd.schemaregistry.v1+json' -d '{"compatibility": "BACKWARD"}'
# ── Register a new version (registry enforces compatibility before accepting) ─
# Adding optional "region" field — BACKWARD compatible:
curl -X POST http://localhost:8081/subjects/orders-value/versions -H 'Content-Type: application/vnd.schemaregistry.v1+json' -d '{
"schemaType": "JSON",
"schema": "{"type":"object","required":["id","status"],"properties":{"id":{"type":"integer"},"status":{"type":"string"},"region":{"type":"string","default":"us-east-1"}}}"
}'
# Response: {"id":2} — accepted because BACKWARD compatible
# ── Attempt a breaking change — registry rejects it ───────────────────
# Adding "region" as REQUIRED — NOT BACKWARD compatible (old data lacks it):
curl -X POST http://localhost:8081/subjects/orders-value/versions -H 'Content-Type: application/vnd.schemaregistry.v1+json' -d '{
"schemaType": "JSON",
"schema": "{"type":"object","required":["id","status","region"],...}"
}'
# Response: 409 Conflict
# {"error_code":409,"message":"Schema being registered is incompatible with an earlier schema"}
# ── List schema versions and retrieve by ID ───────────────────────────
curl http://localhost:8081/subjects/orders-value/versions # [1, 2]
curl http://localhost:8081/subjects/orders-value/versions/1 # schema version 1
curl http://localhost:8081/schemas/ids/2 # schema by ID
// ── AWS Glue Schema Registry (TypeScript SDK) ─────────────────────────
import { GlueClient, CreateSchemaCommand, RegisterSchemaVersionCommand } from '@aws-sdk/client-glue';
const glue = new GlueClient({ region: 'us-east-1' });
// Create registry and schema:
await glue.send(new CreateSchemaCommand({
RegistryId: { RegistryName: 'orders-registry' },
SchemaName: 'OrderEvent',
DataFormat: 'JSON',
Compatibility: 'BACKWARD',
SchemaDefinition: JSON.stringify(orderSchemaV1),
}));
// Register a new version:
await glue.send(new RegisterSchemaVersionCommand({
SchemaId: { SchemaName: 'OrderEvent', RegistryName: 'orders-registry' },
SchemaDefinition: JSON.stringify(orderSchemaV2),
}));
// ── Compatibility mode summary ─────────────────────────────────────────
// Mode | Can new schema read old data? | Can old schema read new data?
// BACKWARD | Yes | Not required
// FORWARD | Not required | Yes
// FULL | Yes | Yes
// NONE | No check | No check
//
// Rule of thumb:
// - Consumer-driven APIs (REST): use BACKWARD — new server code reads old client data
// - Producer-driven events (Kafka): use FORWARD — old consumers read new event shape
// - Shared schemas in regulated systems: use FULLBACKWARD compatibility is the most common choice for REST APIs: the new server schema accepts documents that old clients send. FORWARD is appropriate for event-driven systems where you deploy producers before consumers: old consumer code must be able to read events written by the new producer. FULL is the safest but most restrictive — use it for schemas in regulated industries or shared event buses where multiple teams depend on the same topic. For JSON Schema versioning in OpenAPI specification context, see the linked guide.
Testing JSON Schema Migrations
Example-based unit tests are insufficient for schema migrations — they only cover the cases you thought of. Property-based testing, fuzz testing, and contract testing each catch a different class of migration bugs.
// ── Property-based testing with fast-check ───────────────────────────
import * as fc from 'fast-check';
import Ajv from 'ajv';
import { upcast } from './upcasters/orders';
import { orderSchemaV1, orderSchemaV3 } from './schemas/orders';
const ajvV1 = new Ajv();
const ajvV3 = new Ajv({ useDefaults: true });
const validateV1 = ajvV1.compile(orderSchemaV1);
const validateV3 = ajvV3.compile(orderSchemaV3);
// Arbitrary that generates valid OrderV1 documents:
const orderV1Arbitrary = fc.record({
schemaVersion: fc.constant(1 as const),
userId: fc.integer({ min: 1, max: 999999 }),
totalAmount: fc.float({ min: 0.01, max: 99999.99 }).map(n => n.toFixed(2)),
status: fc.oneof(fc.constant('pending'), fc.constant('paid')),
// optional fields that may or may not be present:
tags: fc.option(fc.array(fc.string({ maxLength: 20 }), { maxLength: 5 })),
});
test('upcast: every valid V1 document produces a valid V3 document', () => {
fc.assert(
fc.property(orderV1Arbitrary, (v1Doc) => {
// Pre-condition: generated doc is valid V1
expect(validateV1(v1Doc)).toBe(true);
// Run the upcaster chain V1 -> V2 -> V3
const v3Doc = upcast(v1Doc);
// Post-condition: output satisfies V3 schema
const valid = validateV3(v3Doc);
if (!valid) {
console.error('Upcaster output failed V3 validation:', validateV3.errors);
console.error('Input was:', v1Doc);
}
return valid;
}),
{ numRuns: 10000 } // 10,000 random payloads — runs in <1 second
);
});
test('upcast: idempotent — upcasting a V3 document returns it unchanged', () => {
const orderV3Arbitrary = fc.record({
schemaVersion: fc.constant(3 as const),
user_id: fc.integer({ min: 1, max: 999999 }),
amount: fc.float({ min: 0.01, max: 99999.99 }),
region: fc.constantFrom('us-east-1', 'eu-west-1', 'ap-southeast-1'),
priority: fc.constantFrom('low', 'normal', 'high'),
});
fc.assert(
fc.property(orderV3Arbitrary, (v3Doc) => {
const result = upcast(v3Doc);
expect(result).toEqual(v3Doc);
}),
{ numRuns: 1000 }
);
});
// ── Fuzz testing: send old-schema payloads to new validator ──────────
test('fuzz: old payloads with missing/null fields do not crash upcaster', () => {
const corruptedV1Arbitrary = fc.record({
schemaVersion: fc.constant(1 as const),
userId: fc.oneof(fc.integer(), fc.constant(null), fc.constant(undefined)),
totalAmount: fc.oneof(fc.string(), fc.constant(null), fc.constant('')),
}, { withDeletedKeys: true }); // randomly omit keys
fc.assert(
fc.property(corruptedV1Arbitrary, (corruptDoc) => {
// Upcaster must not throw — it should either succeed or return a typed error
expect(() => upcast(corruptDoc as any)).not.toThrow();
}),
{ numRuns: 5000 }
);
});
// ── Contract testing with schema compatibility check ──────────────────
// ci/check-schema-compatibility.ts
import Ajv from 'ajv';
import addFormats from 'ajv-formats';
import { orderSchemaV2, orderSchemaV3 } from '../schemas/orders';
// Verify V3 is BACKWARD compatible with V2:
// Every valid V2 document must also be valid under V3.
test('schema: V3 is backward compatible with V2', () => {
const ajvV3 = new Ajv({ useDefaults: false });
addFormats(ajvV3);
const validateV3 = ajvV3.compile(orderSchemaV3);
// Example V2 documents (or generate with fast-check)
const v2Examples = [
{ schemaVersion: 2, user_id: 1, amount: 49.99, region: 'us-east-1' },
{ schemaVersion: 2, user_id: 2, amount: 0, region: 'eu-west-1' },
];
for (const doc of v2Examples) {
expect(validateV3(doc)).toBe(true);
}
});
// ── CI integration: block PRs with breaking schema changes ────────────
// package.json scripts:
// "test:schema": "jest --testPathPattern='schema.test.ts'",
// "test:compat": "ts-node ci/check-schema-compatibility.ts"
// GitHub Actions:
// - run: npm run test:schema && npm run test:compatProperty-based testing with fast-check is the highest-leverage investment for schema migration confidence: one test covers an entire equivalence class of inputs rather than a handful of hand-picked examples. The withDeletedKeys option in fast-check's fc.record() randomly omits keys, catching upcasters that crash on missing optional fields — the most common category of migration bug. Combine this with a CI schema compatibility check that blocks any PR introducing a breaking schema change. For JSON Schema testing in detail with Ajv, Zod, and TypeBox, see the linked guide.
Key Terms
- Breaking Change
- A schema modification that causes existing producers or consumers to fail without code changes on their end. For consumers, a breaking change is one where valid old payloads no longer satisfy the new schema — for example, removing a field consumers read, renaming a field, changing a field type, or adding
additionalProperties: falseto a response schema. For producers, a breaking change is one where their existing output no longer satisfies the new schema — for example, adding a new required field they do not supply. Breaking changes require either a coordinated deployment of all affected services, a versioning strategy (URL versioning, payload version field), or an upcaster layer that bridges the old and new shapes transparently. - Non-Breaking Change
- A schema modification where all existing producers can still produce valid payloads and all existing consumers can still consume valid payloads, without any changes to their code. The defining property: the new schema accepts a strict superset of the documents the old schema accepted. Examples of non-breaking changes: adding an optional field with a default value, adding a new enum value (when consumers handle unknown enum values gracefully via a default case), relaxing a constraint (increasing
maxLength, removing aminimum), widening a type fromintegertonumber, and removingadditionalProperties: falsefrom a schema. Non-breaking changes can be deployed to producers and consumers independently in any order. - Schema Upcasting
- The process of transforming a document written against an older schema version into one that satisfies the current (newer) schema version. An upcaster is a pure, deterministic function: it takes an old-version document and returns a new-version document, applying the minimum set of transformations needed (filling in new required fields with defaults, renaming fields, coercing types). Upcasters are chained in a registry: if the current version is 3, a v1 document passes through the v1-to-v2 upcaster and then the v2-to-v3 upcaster before reaching application logic. The key invariant: application code always works with the current schema version; upcasters absorb all backward-compatibility complexity at the boundary. Upcasters are permanent — they must never be deleted as long as any data or message in the wild carries the old version number.
- Schema Registry
- A centralized service that stores versioned schemas, assigns a unique integer ID to each registered schema version, and enforces compatibility rules before accepting a new version. Confluent Schema Registry (open source, part of the Confluent Platform) and AWS Glue Schema Registry are the two most widely deployed implementations. Producers embed the schema ID in each message (typically in a 5-byte header: magic byte + 4-byte ID); consumers fetch the schema by ID to deserialize. Compatibility modes (BACKWARD, FORWARD, FULL, NONE, and their transitive variants) are configured per subject. A schema registry prevents incompatible schemas from reaching the message broker and provides a queryable audit trail of all schema versions and when they were registered.
- Backward Compatibility
- A relationship between two schema versions where data written under the old schema can be read (validated and parsed) by code written against the new schema. In schema registry terminology, BACKWARD means: "the new schema can read old data." This is the most common compatibility requirement for REST APIs: when the server is upgraded with a new schema, it must still be able to process requests from clients running old code. Backward compatibility is achieved by: only adding optional fields (never required ones), never removing fields, never narrowing types, and never adding
additionalProperties: false. In practice, backward compatibility is maintained by the server's upcaster layer — the server upgrades its internal schema to v2 and uses an upcaster to normalize v1 client requests before processing. - Forward Compatibility
- A relationship between two schema versions where data written under the new schema can be read by code written against the old schema. In schema registry terminology, FORWARD means: "old schema can read new data." This is the most critical requirement for event-driven systems where consumers may lag behind producers — a consumer running the v1 schema must be able to parse messages produced by a producer using the v2 schema. Forward compatibility is maintained when new fields in the producer are optional and old consumers ignore unknown fields (which requires that the consumer schema does NOT use
additionalProperties: false). Forward compatibility makes it safe to deploy producers before consumers. - additionalProperties
- A JSON Schema keyword that controls whether properties not listed in the
propertiesobject are allowed in a document. When omitted or set totrue, any additional properties are allowed — documents with unknown fields pass validation. When set tofalse, the schema rejects any document containing a property not explicitly listed. When set to a schema object, additional properties must be valid against that sub-schema. The critical migration implication:additionalProperties: falseon a response or event schema makes every additive change to the producer into a breaking change for strict consumers. UseadditionalProperties: falseonly on request body schemas at API gateway boundaries (for security) or in tight internal contracts — never on response schemas or event payloads you intend to evolve. - Property-Based Testing
- A testing technique where the test framework generates hundreds or thousands of random inputs that satisfy specified properties (arbitraries), runs the system under test against each, and reports any input that falsifies a stated invariant. For JSON schema migration, the invariant is: "every valid v1 document, when passed through the upcaster, produces a valid v2 document." Libraries: fast-check (TypeScript/JavaScript), hypothesis (Python), QuickCheck (Haskell, and ports in many languages). Property-based tests catch edge cases that hand-written examples miss — empty arrays, null optional fields, boundary numeric values, strings at maximum length limits. With fast-check, running 10,000 random v1 payloads through an upcaster and validating all outputs against the v2 schema typically takes under one second, making it practical to run in CI on every commit.
FAQ
What JSON schema changes are backward-compatible vs breaking?
Non-breaking (backward-compatible) changes are those where the new schema accepts a strict superset of documents the old schema accepted, and where old producers can still generate valid payloads without modification. Safe changes include: adding an optional field with a default value (old consumers ignore it; old producers omit it and the default applies), adding a new enum value when consumers use a default case for unknown values, relaxing a constraint (increasing maxLength, removing a minimum), widening a type from integer to number, and removing additionalProperties: false from a schema. Breaking changes cause existing producers or consumers to fail: removing a field consumers depend on, renaming a field (removes the old name, adds a new one), changing a field type (string to number causes parse failures for consumers expecting a string), adding a new required field (existing producers cannot satisfy it), tightening a constraint (reducing maxLength below existing data lengths), and adding additionalProperties: false to a response schema (any new server-side field causes strict consumers to reject the document). The simplest rule: additive changes are safe; any removal or restriction is breaking.
How do I add a new required field to a JSON schema without breaking existing clients?
Adding a required field is inherently breaking for producers because existing code does not send it. Use one of three approaches. First (safest): add the field as optional with a default value, not required. Update all producers to start sending it. After 100% of producers have been updated and confirmed (monitor via logging), consider making it required in a next major version — only if the field is truly always needed. Second: use schema versioning with an upcaster. Keep the field absent in the v1 schema. In v2, make it required. Deploy an upcaster in middleware that transforms v1 documents (without the field) into v2 documents (with the field set to a computed or default value). Consumers see only v2 documents. Producers can be upgraded independently. Third: dual-phase migration. Phase 1 — add the field as optional and start sending it in all producers. Phase 2 (after all producers confirmed) — update the schema to mark it required. Monitoring is critical: log which producer version each request comes from so you can confirm all producers are updated before tightening the schema constraint. Never skip phase 1 — one old producer instance forgotten in a canary or background job can break production.
What is a schema registry and do I need one for JSON schema versioning?
A schema registry is a centralized service that stores versioned schemas, assigns unique IDs to each registered schema, and enforces compatibility rules before accepting new versions. Confluent Schema Registry (open source) and AWS Glue Schema Registry are the two primary options. The registry enforces compatibility modes per subject: BACKWARD (new schema can read old data), FORWARD (old schema can read new data), FULL (both directions), and NONE (no check). For event-driven architectures and message brokers (Kafka, SQS, Kinesis), a schema registry is strongly recommended because producers and consumers are decoupled and may run at different schema versions simultaneously — the registry prevents an incompatible schema from reaching the broker and causing downstream failures. For REST APIs where you control all producers and consumers and deploy them together, a schema registry is optional — a Git repository with versioned schema files and a CI compatibility check achieves most of the same benefits with less operational overhead. For small teams: start with Git-versioned schemas and a CI check; add a schema registry when you have multiple independent teams producing or consuming the same message format.
How do I migrate existing JSON data stored in PostgreSQL when the schema changes?
For JSON data in a PostgreSQL jsonb column, use jsonb_set() for field additions: UPDATE orders SET data = jsonb_set(data, '{}newField}', '"default"') WHERE data->'newField' IS NULL. The create_missing = true default adds the field even if the parent path does not exist. For large tables, never run a single-statement UPDATE — it takes a full-table write lock for the duration of the migration. Instead, batch with cursor pagination: process rows in chunks of 2,000–5,000 using WHERE id > $cursor AND id <= $cursor + $batchSize, committing each batch separately. This keeps each transaction short, releases row locks between batches, and allows the migration to be safely paused and resumed. Make every migration idempotent by including a WHERE clause that skips already-migrated rows. For complex transformations (rename a field, change a type), write a SQL function that reads the old structure and produces the new structure, called per batch. Always test on a copy of production data first, and maintain a rollback strategy: keep the old field alongside the new one during the migration window using dual-write, or take a pg_dump snapshot before starting.
What is an upcaster and how do I implement one for JSON schema migration?
An upcaster is a pure function that transforms a document written against an older schema version into one that satisfies the current schema. The pattern comes from event sourcing, where historical events must be normalized to the current domain model before processing. Implementation steps: (1) Add a schemaVersion field to every document. (2) Define each version as a TypeScript interface or discriminated union type. (3) Write one upcaster function per version transition — upcastV1toV2(v1: OrderV1): OrderV2, upcastV2toV3(v2: OrderV2): OrderV3. (4) Build an upcaster registry — an object mapping version numbers to their upcaster functions. (5) Write a pipeline function that loops from the document's version to the current version, applying each upcaster in sequence. (6) Deploy the pipeline in middleware so all business logic always receives a current-version document. Key properties: upcasters must be pure (no side effects, no network calls, deterministic), must handle every edge case the old schema allowed (including absent optional fields — use nullish coalescing with defaults), and must never be deleted. Cover each upcaster with unit tests and property-based tests that generate random old-version documents and verify the output satisfies the new schema.
How do I test that my JSON schema migration does not break existing clients?
Three testing strategies complement each other. First, property-based testing (fast-check in TypeScript, hypothesis in Python): define an arbitrary that generates all valid old-schema documents (including edge cases — empty arrays, null optional fields, boundary values), run the upcaster on each, and assert the output validates against the new schema. With fast-check, 10,000 runs complete in under a second. This catches edge cases that hand-written examples miss. Second, contract testing: use Pact or similar to record the actual message shapes each consumer expects, then verify your new schema satisfies all recorded contracts before merging. This catches cases where a field marked optional in the schema is actually required by every consumer in practice. Third, fuzz testing: use fast-check with fc.record({...}, { withDeletedKeys: true }) to randomly omit fields from old-schema documents and verify the upcaster handles them without throwing. Combine all three with a CI step: on every PR that modifies a schema file, run the compatibility check (compare old and new schemas programmatically), the property-based migration test, and the contract test suite. Block merge if any check fails.
Should I use additionalProperties: false in my JSON schemas?
Use additionalProperties: false selectively — it causes more problems than it solves when applied to response or event schemas. The core issue: when a producer adds a new field (an additive, non-breaking change), any consumer validator with additionalProperties: false rejects the document. An additive change that should be invisible to old clients becomes a coordinated breaking change requiring simultaneous deployment of all consumers. Where it is appropriate: (1) request body schemas at API gateways or service boundaries, where rejecting unexpected fields prevents parameter pollution attacks and makes implicit contracts explicit; (2) tight internal contracts where you control all producers and consumers and want strict documentation enforcement; (3) configuration file schemas where unexpected keys are user errors. Where it is harmful: response schemas, event payload schemas, any schema that crosses a service boundary you do not fully control. If you currently have additionalProperties: false on response schemas, the migration path is: replace it with explicit property documentation only, update your JSON Schema validator configuration to treat unknown fields as ignorable (the default behavior in most validators when the keyword is absent), and run your test suite to confirm nothing breaks. For fine-grained control, consider unevaluatedProperties: false from JSON Schema draft 2019-09 instead — it works correctly with $ref and composition keywords like allOf.
Further reading and primary sources
- JSON Schema: A Vocabulary for Structural Validation — Official JSON Schema specification covering additionalProperties, unevaluatedProperties, defaults, and all validation keywords
- Confluent Schema Registry Documentation — Schema Registry API reference, compatibility modes (BACKWARD/FORWARD/FULL), and JSON Schema support
- fast-check: Property-Based Testing for TypeScript — fast-check library documentation — arbitraries, runners, and integration with Jest/Vitest for testing schema migrations
- PostgreSQL jsonb_set Documentation — Official PostgreSQL reference for jsonb_set, jsonb_insert, and all jsonb mutation functions used in data migrations
- AWS Glue Schema Registry Developer Guide — AWS Glue Schema Registry setup, compatibility modes, and integration with Kinesis and MSK