JSON in NoSQL Databases: MongoDB, DynamoDB, Firestore & CouchDB

Q: What is the difference between JSON and BSON in MongoDB?

JSON is a text-based format supporting 6 data types: string, number, boolean, null, array, and object. BSON (Binary JSON) is MongoDB's binary serialization format that extends JSON with 19 additional types: ObjectId (12-byte unique identifier), Date (64-bit UTC milliseconds), Decimal128 (high-precision decimal), Binary (raw bytes), Regular Expression, JavaScript code, Timestamp, Int32, Int64, and more. BSON is not human-readable but is faster to parse and encode than JSON, and supports richer types that map naturally to programming language primitives. The MongoDB Node.js driver automatically serializes BSON to JavaScript objects and back — ObjectId becomes a string, Date becomes a Date object, and Decimal128 becomes a string or Decimal128 instance. Documents are stored as BSON on disk but returned as JavaScript/JSON-compatible objects by the driver. The 16 MB BSON document size limit applies to the binary BSON representation, not the JSON string equivalent.

Q: How do I query nested JSON fields in MongoDB?

MongoDB uses dot notation to query nested JSON fields: db.users.find({ "address.city": "New York" }) queries the city field inside an address object. For array fields, $elemMatch queries documents where at least one array element matches all specified conditions: db.products.find({ tags: { $elemMatch: { name: "javascript", level: "advanced" } } }). Without $elemMatch, multiple conditions on the same array field may match across different elements. Use $in to match any of a list of values: { status: { $in: ["active", "pending"] } }. For full-text search across string fields, create a text index and use $text: { $search: "search terms" }. The $regex operator matches string fields against a regular expression. Use $expr to compare two fields within the same document: { $expr: { $gt: ["$spent", "$budget"] } }. Always run .explain("executionStats") to inspect the query plan JSON and confirm indexes are being used.

Q: How does DynamoDB store JSON data?

DynamoDB stores all data as AttributeValue objects — each attribute has a type descriptor key paired with its value. Type descriptors are: S (string), N (number, always stored as string), B (binary, base64-encoded), BOOL (boolean), NULL (null), L (list/array), and M (map/nested object). A plain JSON object like { "name": "Alice", "age": 30 } becomes { "name": { "S": "Alice" }, "age": { "N": "30" } } in the DynamoDB wire format. The AWS SDK's DocumentClient (v2) or the plain DynamoDB client with marshall/unmarshall utilities (v3) converts between plain JSON and AttributeValue format automatically. Numbers are stored as strings internally to avoid floating-point precision issues, so avoid using numbers as sort keys for lexicographic ordering bugs. Items have a 400 KB size limit including attribute names and values. DynamoDB Streams emit AttributeValue JSON for each changed item, enabling event-driven processing of JSON change events.

Q: When should I embed vs reference JSON documents in MongoDB?

Embed (store as a nested JSON object or array) when: the data is always accessed together with the parent document (e.g., address inside a user document), the relationship is one-to-few (a blog post with 0-10 comments), the embedded data is owned exclusively by the parent and not independently updated, and the combined document stays well under the 16 MB BSON limit. Reference (store the ID and look up separately) when: the relationship is one-to-many or many-to-many (a user with thousands of orders), the sub-document is large or grows unboundedly, the embedded data is independently accessed by other parts of the application, or the same data is referenced by multiple parent documents. The key trade-off: embedding eliminates joins (faster reads, atomic updates) but duplicates data and inflates document size; referencing normalizes data but requires multiple queries or $lookup aggregation pipeline stages. For most MongoDB use cases, err toward embedding — the 16 MB limit and query performance both favor fewer, larger documents over many small ones joined at query time.

Q: How do I index JSON fields in a NoSQL database?

In MongoDB, create indexes with db.collection.createIndex({ field: 1 }) for ascending or -1 for descending. For nested JSON fields, use dot notation: createIndex({ "address.city": 1 }). For array fields, a multikey index is automatically created — no special syntax needed. Compound indexes cover multiple fields: createIndex({ status: 1, createdAt: -1 }) supports queries filtering on status and sorting by createdAt. Partial indexes index only documents matching a filter expression: createIndex({ email: 1 }, { partialFilterExpression: { email: { $exists: true } } }) — useful for sparse optional fields. In DynamoDB, the primary key (partition key + optional sort key) is the only index created automatically; add Global Secondary Indexes (GSIs) to support additional query access patterns. Each GSI has its own partition key and optional sort key, and is billed separately for read/write capacity. In Firestore, single-field indexes are created automatically; composite indexes (multiple fields) must be created manually in the Firebase console or via the firebase.json config. Index too many fields in Firestore and you hit the 200 composite index limit per database.

Q: What is the Firestore document JSON structure?

A Firestore document is a set of key-value pairs where values have explicit types: string, number (64-bit float), boolean, null, timestamp (nanosecond precision UTC), geopoint (latitude/longitude), reference (pointer to another document), map (nested object), or array. Unlike MongoDB, Firestore arrays cannot contain maps at query time — you cannot filter by a specific field inside an array element. Documents are organized in collections (analogous to MongoDB collections), and collections can contain subcollections nested inside documents. The REST API represents a Firestore document as a JSON object with a fields key: { "fields": { "name": { "stringValue": "Alice" }, "age": { "integerValue": "30" } } }. The client SDKs abstract this format — you read and write plain JavaScript objects. A document reference path looks like projects/{project}/databases/{db}/documents/{collection}/{docId}. Firestore document IDs are strings (auto-generated as 20-character alphanumeric IDs or custom strings you provide).

Q: How does CouchDB handle JSON document versioning?

CouchDB uses MVCC (Multi-Version Concurrency Control) — every document has a _rev field that is a revision identifier in the format "generation-hash" (e.g., "3-abc123..."). Every update to a document creates a new revision; the _rev field must be included in the update request to prevent conflicts. If two clients read the same _rev and both try to update, the second write fails with a 409 Conflict error. CouchDB stores all historical revisions by default (compaction removes old revisions). This makes CouchDB ideal for offline-first applications: PouchDB (a browser-side JSON store) syncs with CouchDB using the same replication protocol — each side tracks which revisions it has seen and exchanges only the differences. Conflict resolution is application-level: CouchDB picks one revision as the winner deterministically (by revision ID lexicographic order), but stores losing revisions as conflicts accessible via the ?conflicts=true query parameter. Applications can implement custom merge logic by reading conflicting revisions and writing a merged document.

Q: How do I handle the 400 KB DynamoDB item size limit for large JSON?

The DynamoDB 400 KB item size limit includes all attribute names and values. Strategies to stay within the limit: (1) Use short attribute names — "n" instead of "name", "ts" instead of "timestamp" — attribute names count toward the limit and are repeated for every item. (2) Store large JSON blobs in Amazon S3 and save only the S3 key reference in the DynamoDB item — this is the most common pattern for documents like PDFs, images, or large nested JSON. (3) Split large items across multiple items using a composite sort key pattern (e.g., itemId#part#1, itemId#part#2) and reassemble in application code. (4) Compress JSON before storing as a Binary (B) attribute using gzip — JSON compresses 60-80%, allowing items up to ~1.2 MB of raw JSON to fit within the 400 KB binary limit. (5) Remove default/null fields before saving — DynamoDB stores null as NULL which still costs bytes; omitting the attribute entirely is cheaper. (6) Use DynamoDB's document compression via the AWS SDK's compression options (available in some SDK versions). For workloads where items regularly approach the limit, consider migrating large nested JSON attributes to Amazon S3 with a reference pattern.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 20, 2026

Key Facts

MongoDB BSON supports 19 data types beyond JSON's 6: ObjectId, Date, Decimal128, Binary, RegExp, JavaScript code, and more — all transparently serialized/deserialized by the Mongoose/MongoDB driver
DynamoDB items have a 400 KB size limit per item; MongoDB documents have a 16 MB BSON limit; Firestore documents have a 1 MB limit per document
MongoDB $elemMatch queries nested arrays: { tags: { $elemMatch: { name: "javascript" } } } returns documents where at least one array element matches the condition
Firestore charges per document read — denormalizing JSON (embedding vs referencing) reduces reads by 80-90% for read-heavy workloads
CouchDB uses MVCC (Multi-Version Concurrency Control) with _rev field in JSON — every update creates a new revision, enabling offline-first sync via PouchDB

NoSQL document databases store JSON natively — MongoDB uses BSON (Binary JSON) extending JSON with ObjectId, Date, and Decimal128 types; DynamoDB stores JSON as AttributeValue maps; Firestore stores documents as key-value maps with typed scalars; and CouchDB stores plain JSON documents with _id and _rev fields. MongoDB's BSON ObjectId is a 12-byte identifier encoded as a 24-hex string — querying by _id with ObjectId("...") is the fastest single-document lookup at ~0.1 ms, while a full collection scan over 1 million documents without an index takes ~500 ms. This guide covers JSON document structure in MongoDB, DynamoDB, Firestore, and CouchDB; indexing strategies for nested JSON fields; querying patterns for arrays and embedded documents; schema design patterns (embedding vs referencing); and Node.js client integration.

MongoDB BSON vs JSON: Types and Document Structure

BSON extends JSON with 13 additional types that map directly to programming language primitives — making it a superset of JSON optimized for database storage. A MongoDB document always has an _id field (auto-generated as an ObjectId if omitted), can embed nested documents up to 100 levels deep, and has a 16 MB binary BSON size limit per document. The Node.js driver transparently converts BSON to JavaScript: ObjectId becomes a string (via .toString()), Date becomes a JavaScript Date object, and Decimal128 is returned as a Decimal128 instance you call .toString() on.

// MongoDB document — BSON types shown with JS driver representations
// Stored as BSON on disk; returned as JavaScript objects by the driver

// Insert a document with BSON-specific types
import { MongoClient, ObjectId, Decimal128 } from 'mongodb'

const client = new MongoClient(process.env.MONGODB_URI!)
const db = client.db('shop')
const products = db.collection('products')

await products.insertOne({
  // _id auto-generated as ObjectId if omitted
  _id: new ObjectId(),                    // BSON ObjectId — 24-hex string "507f1f77bcf86cd799439011"
  name: 'TypeScript Handbook',            // string
  price: new Decimal128('29.99'),         // BSON Decimal128 — high-precision decimal
  stock: 142,                             // int32
  available: true,                        // boolean
  tags: ['typescript', 'programming'],    // BSON array
  publishedAt: new Date('2026-01-15'),    // BSON Date — 64-bit UTC milliseconds
  metadata: {                             // embedded BSON document (nested object)
    isbn: '978-0-000-00000-0',
    pages: 512,
    publisher: { name: 'TechPress', country: 'US' },
  },
  coverImage: null,                       // null
})

// Document as returned by find() — BSON types converted to JS
// {
//   _id: ObjectId('507f1f77bcf86cd799439011'),   // call .toString() for string
//   name: 'TypeScript Handbook',
//   price: Decimal128('29.99'),                   // call .toString() for string
//   stock: 142,
//   available: true,
//   tags: ['typescript', 'programming'],
//   publishedAt: 2026-01-15T00:00:00.000Z,        // JS Date object
//   metadata: {
//     isbn: '978-0-000-00000-0',
//     pages: 512,
//     publisher: { name: 'TechPress', country: 'US' },
//   },
//   coverImage: null,
// }

// Querying by ObjectId _id — fastest lookup at ~0.1ms
const product = await products.findOne({ _id: new ObjectId('507f1f77bcf86cd799439011') })

// Common mistake: querying _id with a plain string (no ObjectId wrapper)
// This will NOT match and returns null — always wrap with new ObjectId()
const wrong = await products.findOne({ _id: '507f1f77bcf86cd799439011' }) // returns null

// BSON size limit: documents must be <= 16 MB
// Use db.collection.stats() to check average document sizes
const stats = await db.command({ collStats: 'products' })
console.log('avg doc size:', stats.avgObjSize, 'bytes')

The 16 MB BSON document limit applies to the binary BSON representation, not the equivalent JSON string. Use BSON.calculateObjectSize(doc) from the bsonnpm package to measure a document's binary size before inserting. Embedding large arrays (e.g., thousands of comment objects inside a post document) is the most common cause of approaching the 16 MB limit — switch to referencing when a document's embedded array grows beyond ~1,000 elements.

MongoDB JSON Querying: Dot Notation and Array Operators

MongoDB's query language operates directly on JSON document structure using dot notation for nested fields, array operators for embedded arrays, and comparison operators that mirror JSON value types. Every query is expressed as a JSON filter object — the same format used for both read (find, findOne) and write (updateOne, deleteMany) operations. Running .explain("executionStats") on any query returns a JSON plan showing whether an index was used, the number of documents scanned, and execution time in milliseconds.

import { MongoClient } from 'mongodb'

const db = (await new MongoClient(process.env.MONGODB_URI!).connect()).db('shop')
const users = db.collection('users')

// ── Dot notation for nested JSON fields ───────────────────────
// Query users in New York — dot notation traverses the JSON structure
await users.find({ 'address.city': 'New York' }).toArray()

// Multi-level nesting — up to any depth
await users.find({ 'profile.preferences.theme': 'dark' }).toArray()

// ── Array operators ────────────────────────────────────────────
// $elemMatch — at least one array element matches ALL conditions
// Without $elemMatch, conditions may match across different elements
await users.find({
  skills: { $elemMatch: { name: 'javascript', level: { $gte: 3 } } }
}).toArray()

// $in — field value is one of a list of values (works for arrays too)
await users.find({ status: { $in: ['active', 'trial'] } }).toArray()

// $all — array field contains ALL specified values
await users.find({ roles: { $all: ['editor', 'admin'] } }).toArray()

// $size — array field has exactly N elements
await users.find({ tags: { $size: 3 } }).toArray()

// ── Comparison and logical operators ──────────────────────────
await users.find({
  $and: [
    { age: { $gte: 18, $lte: 65 } },
    { 'address.country': 'US' },
    { score: { $gt: 50 } },
  ]
}).toArray()

// $regex — string field matches a regular expression
await users.find({ email: { $regex: /^alice/i } }).toArray()

// $exists — field is present (true) or absent (false) in the JSON document
await users.find({ phoneNumber: { $exists: true } }).toArray()

// $type — field is a specific BSON type (2 = string, 16 = int32, 10 = null)
await users.find({ createdAt: { $type: 'date' } }).toArray()

// $expr — compare two fields within the same document
// Find users where spent > budget (cross-field comparison)
await users.find({ $expr: { $gt: ['$spent', '$budget'] } }).toArray()

// ── explain() — query plan as JSON ────────────────────────────
const plan = await users.find({ 'address.city': 'New York' })
  .explain('executionStats')

console.log('index used:', plan.queryPlanner.winningPlan.inputStage.indexName)
console.log('docs examined:', plan.executionStats.totalDocsExamined)
console.log('docs returned:', plan.executionStats.nReturned)
console.log('time ms:', plan.executionStats.executionTimeMillis)
// If totalDocsExamined >> nReturned → add an index on 'address.city'

The $elemMatch vs bare array query distinction is one of the most common MongoDB query bugs: { tags: { name: "js", level: 3 } } matches a document where any array element has name: "js" AND any (possibly different) element has level: 3, while { tags: { $elemMatch: { name: "js", level: 3 } } } requires both conditions on the same element. Always use $elemMatch for multi-condition array queries.

MongoDB Indexing JSON Fields

MongoDB indexes are created on specific JSON field paths and are the primary performance lever for document queries. Without an index, MongoDB performs a collection scan — reading every document — which takes O(n) time. With a B-tree index on a field, single-document lookups take O(log n) and range queries take O(log n + k) where k is the number of results. Index creation is an O(n) background operation that does not block reads but may slow writes during build.

import { MongoClient } from 'mongodb'

const db = (await new MongoClient(process.env.MONGODB_URI!).connect()).db('shop')
const users = db.collection('users')
const products = db.collection('products')

// ── Single field index ─────────────────────────────────────────
// 1 = ascending, -1 = descending
await users.createIndex({ email: 1 }, { unique: true })
await users.createIndex({ createdAt: -1 })  // descending for "sort by recent"

// ── Compound index ─────────────────────────────────────────────
// Supports queries on (status), (status + createdAt)
// Does NOT support queries on (createdAt) alone without a separate index
await users.createIndex({ status: 1, createdAt: -1 })

// ── Nested JSON field index (dot notation) ─────────────────────
await users.createIndex({ 'address.city': 1 })
await users.createIndex({ 'profile.preferences.language': 1 })

// ── Multikey index — automatically created for array fields ────
// No special syntax: MongoDB detects the field is an array
await products.createIndex({ tags: 1 })  // multikey index
// Now db.products.find({ tags: 'javascript' }) uses the index

// Multikey compound index — only ONE field can be an array
await products.createIndex({ category: 1, tags: 1 })  // tags is the array field

// ── Text index — full-text search across string fields ─────────
// Only one text index per collection allowed
await products.createIndex(
  { name: 'text', description: 'text' },
  { weights: { name: 10, description: 1 } }  // name matches rank 10x higher
)
// Query with text index:
await products.find({ $text: { $search: 'typescript handbook' } }).toArray()

// ── Partial index — index only documents matching a filter ─────
// Use for sparse optional fields: saves space, avoids null index entries
await users.createIndex(
  { phoneNumber: 1 },
  { partialFilterExpression: { phoneNumber: { $exists: true, $type: 'string' } } }
)

// ── Sparse index — older alternative to partial index ──────────
// Indexes only documents where the field exists (any value, including null)
await users.createIndex({ deletedAt: 1 }, { sparse: true })

// ── TTL index — automatically delete documents after N seconds ─
// Used for session documents, log entries, cache documents
await db.collection('sessions').createIndex(
  { expiresAt: 1 },
  { expireAfterSeconds: 0 }  // delete when expiresAt <= current time
)

// ── List all indexes on a collection ──────────────────────────
const indexes = await users.listIndexes().toArray()
console.log(JSON.stringify(indexes, null, 2))

The ESR rule (Equality, Sort, Range) guides compound index field ordering: put equality filter fields first, sort fields second, and range filter fields last. For example, a query filtering on status = "active" (equality), sorting by createdAt (sort), and filtering on score > 50 (range) should use a compound index { status: 1, createdAt: -1, score: 1 } in that order. MongoDB supports a maximum of 64 indexes per collection — index only fields used in frequent queries.

DynamoDB JSON: AttributeValue Format and Single-Table Design

DynamoDB stores all data using AttributeValue objects — a typed envelope where each attribute is a JSON object with a single key representing its type (S for string, N for number, BOOL for boolean, L for list, M for map). Single-table design is the recommended DynamoDB pattern: all entity types live in one table, distinguished by partition key prefixes and sort key patterns, with Global Secondary Indexes (GSIs) enabling additional access patterns without table scans.

import { DynamoDBClient, PutItemCommand, GetItemCommand } from '@aws-sdk/client-dynamodb'
import { DynamoDBDocumentClient, PutCommand, GetCommand, QueryCommand } from '@aws-sdk/lib-dynamodb'

const client = new DynamoDBClient({ region: 'us-east-1' })

// ── Low-level API — AttributeValue format ─────────────────────
// Raw DynamoDB wire format: each attribute wrapped in a type descriptor object
const rawItem = {
  PK: { S: 'USER#alice@example.com' },   // string
  SK: { S: 'PROFILE' },                  // string sort key
  name: { S: 'Alice' },
  age: { N: '30' },                      // numbers stored as strings
  active: { BOOL: true },
  tags: { L: [{ S: 'admin' }, { S: 'editor' }] },   // list
  address: {                             // map (nested JSON object)
    M: {
      city: { S: 'New York' },
      zip: { S: '10001' },
    }
  },
  deletedAt: { NULL: true },             // null value
}

await client.send(new PutItemCommand({ TableName: 'AppTable', Item: rawItem }))
await client.send(new GetItemCommand({
  TableName: 'AppTable',
  Key: { PK: { S: 'USER#alice@example.com' }, SK: { S: 'PROFILE' } },
}))

// ── DocumentClient — plain JSON CRUD (recommended) ────────────
// marshall/unmarshall handled automatically — work with plain JS objects
const docClient = DynamoDBDocumentClient.from(client)

await docClient.send(new PutCommand({
  TableName: 'AppTable',
  Item: {
    PK: 'USER#alice@example.com',
    SK: 'PROFILE',
    name: 'Alice',
    age: 30,
    active: true,
    tags: ['admin', 'editor'],
    address: { city: 'New York', zip: '10001' },
  }
}))

// ── Single-table design — multiple entity types in one table ───
// Entity type is encoded in PK/SK prefixes
// GSI1PK / GSI1SK enable querying by different access patterns

// User entity:   PK = "USER#<email>",    SK = "PROFILE"
// Order entity:  PK = "USER#<email>",    SK = "ORDER#<orderId>"
// Product:       PK = "PRODUCT#<id>",    SK = "PRODUCT#<id>"

// Query all orders for a user (sort key begins_with "ORDER#")
const orders = await docClient.send(new QueryCommand({
  TableName: 'AppTable',
  KeyConditionExpression: 'PK = :pk AND begins_with(SK, :skPrefix)',
  ExpressionAttributeValues: {
    ':pk': 'USER#alice@example.com',
    ':skPrefix': 'ORDER#',
  },
}))

// ── GSI — Global Secondary Index for alternative access patterns ─
// GSI1: GSI1PK = "STATUS#active", GSI1SK = <createdAt ISO string>
// Enables: query all active users sorted by creation date
const activeUsers = await docClient.send(new QueryCommand({
  TableName: 'AppTable',
  IndexName: 'GSI1',
  KeyConditionExpression: 'GSI1PK = :status',
  ExpressionAttributeValues: { ':status': 'STATUS#active' },
}))

// ── DynamoDB Streams — JSON change events ──────────────────────
// Each stream record contains: eventName (INSERT/MODIFY/REMOVE),
// dynamodb.NewImage (AttributeValue of new item),
// dynamodb.OldImage (AttributeValue of previous item)
// Use AWS Lambda trigger to process stream records as JSON events

Single-table design requires identifying all access patterns before creating the table — DynamoDB does not support ad-hoc queries across the full dataset without a GSI or table scan. Map each access pattern to a PK/SK combination and create GSIs for patterns that cannot be served by the primary key. The trade-off vs multi-table design: single-table is more efficient (one network request for related data) but harder to reason about and debug.

Firestore JSON Document Model

Firestore organizes data as a tree of collections and documents — collections contain documents, documents contain fields (typed key-value pairs), and documents can contain subcollections. Unlike MongoDB, Firestore does not support arbitrary nested arrays of objects in queries — array fields can only be queried with array-contains (single value) or array-contains-any (up to 10 values), not field-level conditions on array elements. Firestore charges per document read and write, making schema design (embedding vs subcollections) a direct cost optimization decision.

import { initializeApp } from 'firebase/app'
import {
  getFirestore, doc, collection, getDoc, setDoc,
  addDoc, query, where, orderBy, limit,
  collectionGroup, Timestamp, GeoPoint,
} from 'firebase/firestore'

const app = initializeApp({ projectId: 'my-project' /* ... */ })
const db = getFirestore(app)

// ── Firestore document field types ────────────────────────────
// string, number (64-bit float), boolean, null,
// timestamp, geopoint, reference, map (object), array

await setDoc(doc(db, 'users', 'alice'), {
  name: 'Alice',                                    // string
  age: 30,                                          // number
  active: true,                                     // boolean
  deletedAt: null,                                  // null
  createdAt: Timestamp.fromDate(new Date()),         // timestamp
  location: new GeoPoint(40.7128, -74.0060),        // geopoint
  profileRef: doc(db, 'profiles', 'alice'),          // reference
  address: { city: 'New York', zip: '10001' },      // map (embedded object)
  roles: ['editor', 'admin'],                       // array
})

// ── Subcollection vs embedded map ─────────────────────────────
// Embedded map: address lives inside the user document
//   → Fetched in the same document read (free with the user read)
//   → 1 MB document limit applies to the whole document including maps
//   → Cannot independently paginate or query embedded map entries

// Subcollection: orders live in users/{userId}/orders/{orderId}
//   → Each order is a separate document read (billed per read)
//   → Supports independent pagination and querying
//   → No size limit (each document is separately bounded)

// Add a document to a subcollection
await addDoc(collection(db, 'users', 'alice', 'orders'), {
  total: 49.99,
  status: 'shipped',
  createdAt: Timestamp.now(),
  items: [{ sku: 'BOOK-001', qty: 1, price: 49.99 }],
})

// ── Querying Firestore JSON fields ────────────────────────────
// Simple equality + range + sort — requires composite index if >1 field
const q = query(
  collection(db, 'users'),
  where('active', '==', true),
  where('age', '>=', 18),
  orderBy('age'),
  limit(20)
)

// array-contains — field array contains this exact value
const editorsQ = query(
  collection(db, 'users'),
  where('roles', 'array-contains', 'editor')
)

// array-contains-any — field array contains any of these values
const staffQ = query(
  collection(db, 'users'),
  where('roles', 'array-contains-any', ['editor', 'admin'])
)

// ── Collection group query — query across all subcollections ───
// Query all "orders" subcollections across all users
const allOrdersQ = query(
  collectionGroup(db, 'orders'),
  where('status', '==', 'pending'),
  orderBy('createdAt', 'desc')
)

// ── Firestore Security Rules — JSON field access control ───────
// rules_version = '2';
// service cloud.firestore {
//   match /databases/{database}/documents {
//     match /users/{userId} {
//       allow read: if request.auth.uid == userId;
//       allow write: if request.auth.uid == userId
//         && request.resource.data.age is int
//         && request.resource.data.age >= 0;
//     }
//   }
// }

Firestore composite indexes must be created explicitly for queries combining multiple fields with ordering or inequality filters. The Firebase CLI command firebase deploy --only firestore:indexes deploys indexes from firestore.indexes.json. Firestore automatically creates single-field indexes for every field (ascending and descending) — disable auto-indexing for large string fields (like full document text) by adding an exemption in the Firebase console to reduce index storage costs.

CouchDB and PouchDB: Offline-First JSON

CouchDB is a document database built around HTTP: every operation (create, read, update, delete, replicate) is a plain HTTP request with JSON bodies, making it accessible from any HTTP client without a dedicated driver. Its MVCC revision system and native replication protocol make it uniquely suited for offline-first applications where clients sync intermittently. PouchDB implements the same replication protocol in the browser, storing JSON in IndexedDB and syncing bidirectionally with CouchDB when connectivity is available.

// ── CouchDB HTTP API — plain JSON over HTTP ───────────────────
// Every CouchDB operation is an HTTP request; no special driver required

// Create a database
// PUT http://localhost:5984/mydb

// Create a document — _id is optional (auto-generated UUID if omitted)
// POST http://localhost:5984/mydb
// Content-Type: application/json
// { "name": "Alice", "age": 30, "roles": ["editor"] }
//
// Response: { "ok": true, "id": "abc123", "rev": "1-xyz789" }

// Retrieve a document — response includes _id and _rev
// GET http://localhost:5984/mydb/abc123
// Response:
// { "_id": "abc123", "_rev": "1-xyz789", "name": "Alice", "age": 30, "roles": ["editor"] }

// Update — MUST include _rev to prevent conflict (409 if wrong _rev)
// PUT http://localhost:5984/mydb/abc123
// { "_id": "abc123", "_rev": "1-xyz789", "name": "Alice Updated", "age": 31, "roles": ["editor"] }
// Response: { "ok": true, "id": "abc123", "rev": "2-abc456" }  ← new revision

// ── MapReduce views for JSON aggregation ─────────────────────
// Design document with a view that counts documents by status
// PUT http://localhost:5984/mydb/_design/analytics
const designDoc = {
  _id: '_design/analytics',
  views: {
    byStatus: {
      map: 'function(doc) { if (doc.status) { emit(doc.status, 1); } }',
      reduce: '_count',  // built-in: _count, _sum, _stats
    },
  },
}

// Query the view: GET /mydb/_design/analytics/_view/byStatus?group=true
// Response:
// { "rows": [
//   { "key": "active",   "value": 142 },
//   { "key": "inactive", "value": 37 },
//   { "key": "pending",  "value": 8  }
// ]}

// ── PouchDB — browser-side JSON storage with CouchDB sync ─────
import PouchDB from 'pouchdb'

// Local database stored in IndexedDB (browser) or LevelDB (Node.js)
const localDB = new PouchDB('myapp')

// Put (upsert) a document — must include _id
await localDB.put({ _id: 'user:alice', name: 'Alice', age: 30, synced: false })

// Get a document
const doc = await localDB.get('user:alice')
// doc._rev is automatically set and must be included in subsequent puts

// Update a document — include _rev from the previous get
await localDB.put({ ...doc, age: 31 })

// ── PouchDB ↔ CouchDB bidirectional sync ─────────────────────
const remoteDB = new PouchDB('http://localhost:5984/myapp')

// One-time sync (replicate once in each direction)
await PouchDB.sync(localDB, remoteDB)

// Live sync — keeps local and remote in sync continuously
const syncHandler = PouchDB.sync(localDB, remoteDB, {
  live: true,
  retry: true,  // retry on network failure
}).on('change', (info) => {
  console.log('sync change:', JSON.stringify(info.change.docs.length), 'docs')
}).on('error', (err) => {
  console.error('sync error:', err)
})

// Stop sync when component unmounts
syncHandler.cancel()

// ── Conflict resolution — read and merge conflicting revisions ─
const conflicted = await localDB.get('user:alice', { conflicts: true })
if (conflicted._conflicts?.length) {
  // Fetch all conflicting revisions
  const revs = await Promise.all(
    conflicted._conflicts.map(rev => localDB.get('user:alice', { rev }))
  )
  // Merge logic: pick the most recent, or merge fields
  const merged = { ...conflicted, age: Math.max(conflicted.age, ...revs.map(r => r.age)) }
  // Delete losing revisions and save merged version
  await localDB.bulkDocs([
    merged,
    ...revs.map(r => ({ ...r, _deleted: true })),
  ])
}

CouchDB's _rev system means you can never lose data with a standard update — every revision is retained until compaction runs. The replication protocol is incremental: CouchDB tracks a sequence number (_changesfeed) and only sends documents changed since the last sync, making replication efficient even for large databases. PouchDB's offline-first model is particularly useful for mobile web apps where connectivity is intermittent — changes are stored locally in IndexedDB and synced to CouchDB when online.

Schema Design: Embedding vs Referencing JSON

The central schema design decision in document databases is whether to embed related data as a nested JSON object/array within the parent document, or to reference it by ID and retrieve it in a separate query. This decision directly impacts read performance, write complexity, and storage cost — getting it wrong is expensive to reverse because it requires migrating existing documents. The rule of thumb: embed when data is always accessed together; reference when data grows unboundedly or is accessed independently.

// ── Pattern 1: Embedding (one-to-few) ────────────────────────
// Use when: always accessed together, owned by parent, small and bounded
// Example: user with addresses (a user has 1-5 addresses)

// GOOD — embed addresses inside the user document
const userWithEmbeddedAddresses = {
  _id: 'user_123',
  name: 'Alice',
  email: 'alice@example.com',
  addresses: [                          // embedded array (1-5 items max)
    { type: 'home', city: 'New York', zip: '10001', default: true },
    { type: 'work', city: 'Brooklyn', zip: '11201', default: false },
  ],
}
// Pro: single document read, atomic update of user + addresses
// Con: 16 MB limit, all addresses loaded even if only one is needed

// ── Pattern 2: Referencing (one-to-many) ─────────────────────
// Use when: grows unboundedly, independently accessed, shared across parents
// Example: user with orders (a user may have thousands of orders)

// GOOD — store orders in a separate collection with userId reference
const order = {
  _id: 'order_456',
  userId: 'user_123',          // reference to parent user
  total: 49.99,
  status: 'shipped',
  items: [{ sku: 'BOOK-001', qty: 1, price: 49.99 }],
}
// Index orders by userId for efficient per-user order queries
// db.orders.createIndex({ userId: 1, createdAt: -1 })

// ── Pattern 3: Hybrid (one-to-squillions) ─────────────────────
// Store summary in parent, details in child collection
// Example: post with comment count (not all comments)
const postDocument = {
  _id: 'post_789',
  title: 'JSON Schema Patterns',
  body: '...',
  commentCount: 1247,           // denormalized count — fast to display
  recentComments: [             // last 3 comments embedded for preview
    { author: 'Bob', text: 'Great post!', postedAt: '2026-05-20' },
    { author: 'Carol', text: 'Very helpful.', postedAt: '2026-05-19' },
    { author: 'Dan', text: 'Bookmarked.', postedAt: '2026-05-18' },
  ],
  // Full comments list is in a separate comments collection
}

// ── Pattern 4: Firestore denormalization for read optimization ─
// Firestore charges per document read — denormalize to minimize reads
// Instead of reading user + profile + preferences (3 reads),
// duplicate key fields into a single "user summary" document

// user_summaries/{userId} — denormalized for the feed (1 read = all display data)
const userSummary = {
  uid: 'user_123',
  displayName: 'Alice',
  avatarUrl: 'https://cdn.example.com/alice.jpg',
  followerCount: 3421,
  verified: true,
}

// When the user updates their displayName, update both:
// 1. users/{userId}.displayName (source of truth)
// 2. user_summaries/{userId}.displayName (denormalized copy)
// Use a Firestore batch write to update both atomically

// ── N+1 query problem in document databases ───────────────────
// BAD: load 50 posts, then fetch each author separately (51 queries)
const posts = await db.collection('posts').find({}).limit(50).toArray()
const authors = await Promise.all(
  posts.map(post => db.collection('users').findOne({ _id: post.authorId }))
) // 51 total queries

// GOOD: embed author summary in post document (1 query)
const postsWithEmbeddedAuthor = {
  _id: 'post_789',
  title: 'JSON Schema Patterns',
  author: {                            // denormalized snapshot at write time
    _id: 'user_123',
    name: 'Alice',
    avatarUrl: 'https://cdn.example.com/alice.jpg',
  },
}

// GOOD alternative: $lookup (MongoDB aggregation join — 1 aggregation pipeline)
const result = await db.collection('posts').aggregate([
  { $limit: 50 },
  { $lookup: {
    from: 'users',
    localField: 'authorId',
    foreignField: '_id',
    as: 'author',
    pipeline: [{ $project: { name: 1, avatarUrl: 1 } }],
  }},
  { $unwind: '$author' },
]).toArray()

The N+1 query problem is as common in document databases as in relational databases. Embedding author summaries inside post documents (denormalization) trades write complexity (must update all posts when an author changes their name) for read simplicity (one query for the feed). For Firestore specifically, always calculate the read cost of your schema: a social feed showing 20 posts, each with the author's avatar and name, costs 1 read with embedding vs 21 reads with referencing — an 80-90% cost reduction that compounds with traffic.

Key Terms

BSON: Binary JSON — MongoDB's binary serialization format that extends JSON with 19 data types including ObjectId, Date, Decimal128, Binary, Int32, Int64, Timestamp, and JavaScript code. BSON is not human-readable but is faster to parse than JSON text and supports richer types that map directly to programming language primitives. The MongoDB Node.js driver automatically serializes BSON to JavaScript objects on read and serializes JavaScript objects to BSON on write. BSON documents have a 16 MB size limit. The bson npm package provides BSON.serialize(), BSON.deserialize(), and BSON.calculateObjectSize() utilities for working with BSON outside of MongoDB.
AttributeValue: DynamoDB's typed envelope format for storing attribute values. Every DynamoDB attribute is stored as a JSON object with a single key indicating its type: S (string), N (number — stored as a string to preserve precision), B (binary — base64 encoded), BOOL (boolean), NULL (null), L (list/array of AttributeValues), or M (map/object of string keys to AttributeValues). The AWS SDK v3's marshall and unmarshall functions from @aws-sdk/util-dynamodb convert between plain JavaScript objects and AttributeValue format. The DynamoDBDocumentClient applies these conversions automatically so application code works with plain JSON.
single-table design: A DynamoDB schema pattern where all entity types (users, orders, products, sessions) are stored in a single DynamoDB table, distinguished by partition key and sort key prefix conventions (e.g., USER#alice, ORDER#xyz). Single-table design enables fetching related entities in a single query — for example, a user profile plus their last 10 orders in one Query call using begins_with(SK, 'ORDER#'). Global Secondary Indexes (GSIs) are added to support access patterns that cannot be served by the primary key. Single-table design is essential for cost and performance at scale because DynamoDB charges per request and has no server-side join capability.
subcollection: A Firestore collection nested inside a document, enabling hierarchical data organization. A subcollection is accessed by its full path: users/{userId}/orders/{orderId}. Subcollection documents are billed separately per read — unlike embedded maps, which are included in the parent document read at no extra cost. Subcollections support independent querying and pagination, and have no size limit on the number of documents (unlike embedded arrays, which contribute to the 1 MB document limit). Collection group queries (collectionGroup('orders')) query across all subcollections with the same name across all parent documents simultaneously.
MVCC revision: Multi-Version Concurrency Control — CouchDB's mechanism for managing concurrent document updates without locks. Every CouchDB document has a _rev field in the format "generation-hash" (e.g., "3-abc123..."). Every successful write increments the generation number and generates a new hash. Updates must include the current _rev— if two clients read the same revision and both attempt to write, the second write returns a 409 Conflict error. CouchDB retains all historical revisions until compaction. The revision system enables PouchDB's offline-first sync: each side tracks which revisions it has seen and exchanges only the diff, making replication bandwidth-efficient and safe across intermittent connectivity.
denormalization: The practice of intentionally duplicating data across documents to reduce the number of reads required to serve a query. In relational databases, normalization eliminates redundancy; in document databases (especially Firestore, where reads are billed per document), denormalization is a performance and cost optimization. For example, embedding an author's name and avatar URL inside each post document means a feed query reads only N post documents instead of N posts plus N author documents. The trade-off: writes become more complex (updating a user's display name requires updating all posts authored by that user) and data may temporarily be inconsistent between copies. Use Firestore batch writes or MongoDB transactions to keep denormalized copies consistent.

FAQ

What is the difference between JSON and BSON in MongoDB?

JSON is a text-based format with 6 data types (string, number, boolean, null, array, object). BSON is MongoDB's binary serialization format extending JSON with 19 types including ObjectId, Date, Decimal128, Binary, Int32, Int64, and JavaScript code. BSON is faster to parse than JSON text because it is binary-encoded with length prefixes, avoiding the need to scan for delimiters. The MongoDB Node.js driver converts BSON to JavaScript transparently — ObjectId becomes a string, Date becomes a Date object. The 16 MB document limit applies to the binary BSON size. BSON is not human-readable; use JSON.stringify(doc) to inspect document contents during debugging, but never use JSON.parseto create MongoDB queries — always use the driver's query API.

How do I query nested JSON fields in MongoDB?

Use dot notation to traverse nested JSON: { "address.city": "New York" } queries the city field inside an address object. For arrays of objects, use $elemMatch to require multiple conditions on the same array element: { skills: { $elemMatch: { name: "js", level: { $gte: 3 } } } }. Without $elemMatch, { "skills.name": "js", "skills.level": { "$gte": 3 } } may match documents where different array elements satisfy each condition separately. Use $in for membership checks, $regex for string pattern matching, $exists to check field presence, and $expr for cross-field comparisons. Always run .explain("executionStats") to verify indexes are being used — a high totalDocsExamined relative to nReturned indicates a missing index.

How does DynamoDB store JSON data?

DynamoDB stores data as AttributeValue objects — each attribute is a JSON object with a type descriptor key: S for string, N for number (stored as a string), BOOL for boolean, NULL for null, L for list, M for map. A plain JSON object { "name": "Alice", "age": 30 } becomes { "name": { "S": "Alice" }, "age": { "N": "30" } } in the DynamoDB wire format. Use the AWS SDK v3's DynamoDBDocumentClient to work with plain JSON objects — it applies marshall and unmarshall automatically. Numbers are stored as strings to avoid floating-point precision loss; be careful using numbers as sort keys because lexicographic ordering differs from numeric ordering for numbers of different lengths (e.g., "9" sorts after "10" lexicographically). The 400 KB item size limit includes attribute names and values combined.

When should I embed vs reference JSON documents in MongoDB?

Embed when: data is always accessed together (addresses inside a user), the relationship is one-to-few (a post with up to 10 tags), the embedded data is owned exclusively by the parent, and the combined document size stays well under 16 MB. Reference when: the relationship is one-to-many or many-to-many (a user with thousands of orders), the sub-document grows unboundedly (comments on a viral post), the data is independently accessed or shared by multiple parents, or embedding would push the document near the 16 MB BSON limit. The embedding vs referencing decision is hard to reverse — migrating millions of documents is expensive. Model your data around your access patterns: if the application always reads X and Y together, embed Y inside X; if it sometimes reads Y without X, reference it.

How do I index JSON fields in a NoSQL database?

In MongoDB, use db.collection.createIndex({ field: 1 }) for a single-field index (ascending). For nested JSON fields, use dot notation: { "address.city": 1 }. Array fields automatically get multikey indexes — no special syntax. Compound indexes cover queries on multiple fields; follow the ESR rule (Equality fields first, Sort fields second, Range fields last). In DynamoDB, only the primary key (partition key + sort key) is indexed by default; add GSIs for additional query access patterns. Each GSI is billed separately for read/write capacity and adds write overhead. In Firestore, single-field indexes are created automatically; composite indexes (multiple fields, or inequality on one field plus sort on another) must be created manually in the Firebase console. Index creation fails silently if you run a Firestore query that requires a composite index — check the Firebase console for index suggestions.

What is the Firestore document JSON structure?

A Firestore document is a set of typed key-value pairs. Supported field types: string, number (64-bit float), boolean, null, timestamp (nanosecond UTC precision), geopoint (lat/lng), reference (document path), map (nested object), and array. Documents are identified by a path like users/alice (collection: users, document ID: alice). The REST API wraps each value in a type descriptor ({ "stringValue": "Alice" }), but client SDKs accept and return plain JavaScript objects. Firestore has a 1 MB per-document limit. Arrays cannot contain maps for array-containsfield-level queries. Subcollections — collections nested inside documents — allow storing related data with independent billing and query support. A document's subcollection does not count toward the 1 MB document limit.

How does CouchDB handle JSON document versioning?

Every CouchDB document has a _rev field in "generation-hash" format (e.g., "3-abc123..."). Every successful write increments the generation and generates a new hash from the document content. To update a document, include the current _rev in the request body — if the _rev does not match the stored revision (because another client updated it first), CouchDB returns a 409 Conflict error. CouchDB retains all historical revisions until compaction, enabling audit trails. Conflicts (from offline edits that both succeed on different nodes) are resolved by reading all conflicting revisions via ?conflicts=true, applying application-level merge logic, and saving the merged document while deleting the losers with _deleted: true. PouchDB replicates using the same _changes feed protocol, making browser-to-CouchDB sync incremental and safe.

How do I handle the 400 KB DynamoDB item size limit for large JSON?

The 400 KB limit includes all attribute names and values combined. Strategies: (1) Use short attribute names — "n" instead of "name", "ts" instead of "timestamp" — attribute names repeat for every item and add up quickly. (2) Store large JSON blobs in S3 and save only the S3 key in DynamoDB — the most common pattern for documents, images, and large nested JSON. (3) Compress JSON with gzip before storing as a Binary (B) attribute — JSON compresses 60-80%, allowing ~1-1.5 MB of raw JSON to fit within the 400 KB binary limit. (4) Split large items using a composite sort key pattern (itemId#chunk#0, itemId#chunk#1) and reassemble in application code. (5) Remove null or default-value fields before saving — omitting an attribute entirely costs zero bytes versus storing NULL. For workloads where items regularly approach the limit, S3 offloading with a reference field is the standard industry solution.