Search JSON Data in JavaScript: Recursive, JSONPath, fuse.js & Elasticsearch

Last updated:

Searching JSON data in JavaScript has four distinct approaches: recursive property traversal (pure JS, any structure), JSONPath expressions (declarative path queries), fuzzy search with fuse.js (typo-tolerant full-text), and Elasticsearch JSON Query DSL (distributed full-text at scale). JSONPath $.store.book[?(@.price < 10)] filters an array by expression in a single statement — no loops required. Fuse.js scores results by edit distance with a threshold parameter (0.0 = exact match, 1.0 = anything matches).

This guide covers recursive deep search with the JSON.stringify() hack vs proper traversal, JSONPath with the jsonpath npm package, fuse.js fuzzy search configuration, and Elasticsearch multi-match queries. Every example shows time complexity.

Recursive Property Search in JSON Objects

Recursive traversal is the most flexible JSON search strategy — it works on any structure, requires no dependencies, and finds values or keys at any depth. The algorithm visits every node: if the node is a primitive, compare it to the target; if it is an array, recurse into each element; if it is a plain object, recurse into each value. Time complexity is O(n) where n is the total number of values in the tree — you cannot do better because any value could be the match.

// ── Strategy 1: JSON.stringify() hack — fastest for simple strings ─
// 2–5x faster than recursive traversal for simple string containment
// WARNING: matches values AND key names; use only for quick existence checks
function containsString(obj, term) {
  return JSON.stringify(obj).includes(term);
}

const data = { user: { name: 'Alice', city: 'Paris' }, score: 42 };
containsString(data, 'Paris');  // true  (O(n) — but highly optimized C++ string search)
containsString(data, 'user');   // true  (false positive: matches the key name "user")

// ── Strategy 2: Proper recursive deep search ─────────────────────
// Returns true if any VALUE (not key) equals target
function deepContains(obj, target) {
  if (obj === target) return true;
  if (Array.isArray(obj)) return obj.some(item => deepContains(item, target));
  if (obj !== null && typeof obj === 'object') {
    return Object.values(obj).some(v => deepContains(v, target));
  }
  return false;
}

deepContains(data, 'Paris');   // true
deepContains(data, 'user');    // false — 'user' is a key, not a value

// ── Collect all matches (not just true/false) ─────────────────────
function deepFindAll(obj, predicate, path = '', results = []) {
  if (predicate(obj)) {
    results.push({ value: obj, path });
  }
  if (Array.isArray(obj)) {
    obj.forEach((item, i) => deepFindAll(item, predicate, `${path}[${i}]`, results));
  } else if (obj !== null && typeof obj === 'object') {
    for (const [key, value] of Object.entries(obj)) {
      deepFindAll(value, predicate, path ? `${path}.${key}` : key, results);
    }
  }
  return results;
}

const catalog = {
  books: [
    { title: 'Moby Dick', price: 8.99, inStock: true },
    { title: 'War and Peace', price: 12.99, inStock: false },
    { title: 'The Iliad', price: 6.99, inStock: true },
  ],
  electronics: [
    { name: 'Keyboard', price: 79.99, inStock: true },
  ],
};

// Find all items priced under $10
deepFindAll(catalog, v => typeof v === 'number' && v < 10);
// [
//   { value: 8.99,  path: 'books[0].price' },
//   { value: 6.99,  path: 'books[2].price' },
// ]

// ── Search by key name at any depth ──────────────────────────────
function findByKey(obj, targetKey, results = []) {
  if (obj !== null && typeof obj === 'object') {
    for (const [key, value] of Object.entries(obj)) {
      if (key === targetKey) results.push(value);
      if (typeof value === 'object') findByKey(value, targetKey, results);
    }
  } else if (Array.isArray(obj)) {
    obj.forEach(item => findByKey(item, targetKey, results));
  }
  return results;
}

findByKey(catalog, 'price');
// [8.99, 12.99, 6.99, 79.99]  — all price values at any depth

// ── With max-depth guard (prevent stack overflow on hostile input) ─
function deepFindSafe(obj, predicate, maxDepth = 50, depth = 0) {
  if (depth > maxDepth) return false;
  if (predicate(obj)) return true;
  if (Array.isArray(obj)) return obj.some(i => deepFindSafe(i, predicate, maxDepth, depth + 1));
  if (obj !== null && typeof obj === 'object') {
    return Object.values(obj).some(v => deepFindSafe(v, predicate, maxDepth, depth + 1));
  }
  return false;
}

The JSON.stringify() hack is legitimate for one specific case: quickly checking whether a string value exists anywhere in a JSON object when you do not care about false positives from key names and do not need the path. V8's string search is optimized native code and significantly faster than JavaScript-level recursion. For everything else — structured matches, type-specific matches, collecting results with paths — use proper recursive traversal. Always add a maxDepth guard when searching JSON from external sources to prevent stack overflow from pathologically deep input.

JSONPath Expressions for Declarative JSON Search

JSONPath expresses traversal and filtering as a single string expression — no loops, no recursion code, no intermediate variables. The root is $. Child access uses dot notation ($.store.name) or bracket notation ($['store']['name']). The recursive descent operator .. matches at any depth. Filter expressions [?(@.price < 10)] select array elements by predicate. Install the jsonpath npm package (npm install jsonpath) for Node.js, or jsonpath-plus for additional features like regex filters.

// npm install jsonpath
const jp = require('jsonpath');

const store = {
  books: [
    { title: 'Moby Dick',    author: 'Herman Melville', price: 8.99,  inStock: true  },
    { title: 'War and Peace',author: 'Leo Tolstoy',     price: 12.99, inStock: false },
    { title: 'The Iliad',   author: 'Homer',            price: 6.99,  inStock: true  },
    { title: 'Don Quixote', author: 'Cervantes',        price: 9.99,  inStock: true  },
  ],
  electronics: [
    { name: 'Keyboard', price: 79.99, inStock: true },
  ],
};

// ── Basic child access ────────────────────────────────────────────
jp.query(store, '$.books[0].title');
// ['Moby Dick']

jp.query(store, '$.books[*].title');
// ['Moby Dick', 'War and Peace', 'The Iliad', 'Don Quixote']

// ── Recursive descent (..) — finds at ANY depth ───────────────────
// O(n) — scans the full tree, same as manual recursion
jp.query(store, '$..price');
// [8.99, 12.99, 6.99, 9.99, 79.99]  — books + electronics

jp.query(store, '$..title');
// ['Moby Dick', 'War and Peace', 'The Iliad', 'Don Quixote']

// ── Filter expressions — declarative predicate search ─────────────
// Books cheaper than $10
jp.query(store, '$.books[?(@.price < 10)]');
// [{ title: 'Moby Dick', price: 8.99 }, { title: 'The Iliad', price: 6.99 }]

// Books in stock AND under $10
jp.query(store, '$.books[?(@.inStock == true && @.price < 10)]');
// [{ title: 'Moby Dick', ... }, { title: 'The Iliad', ... }]

// All in-stock items across books and electronics (recursive + filter)
jp.query(store, '$..[?(@.inStock == true)]');
// Moby Dick, The Iliad, Don Quixote, Keyboard

// ── Array slices and union ────────────────────────────────────────
jp.query(store, '$.books[0:2].title');   // slice [0, 2) → first two
// ['Moby Dick', 'War and Peace']

jp.query(store, '$.books[-1].title');    // last element
// ['Don Quixote']

jp.query(store, '$.books[0,2].title');   // union: indices 0 and 2
// ['Moby Dick', 'The Iliad']

// ── Get paths, not values ─────────────────────────────────────────
jp.paths(store, '$.books[?(@.price < 10)]');
// [['$', 'books', 0], ['$', 'books', 2]]  — array path to each match

// jp.value() — first match only (like find())
jp.value(store, '$.books[?(@.price < 10)]');
// { title: 'Moby Dick', price: 8.99, ... }

// ── jsonpath-plus: extra features ────────────────────────────────
// npm install jsonpath-plus
import { JSONPath } from 'jsonpath-plus';

// Regex filter (extension over standard JSONPath)
JSONPath({ path: '$.books[?(@.author =~ /^H/i)]', json: store });
// Books by Herman Melville and Homer (author starts with H)

// Parent axis (^) — get the parent of the match
JSONPath({ path: '$.books[?(@.price < 10)]^', json: store });
// Returns the books array itself (parent of matching elements)

JSONPath is the right tool when the traversal logic would otherwise require 10+ lines of recursive JavaScript — a single $..[?(predicate)] expression replaces a recursive function with filter logic. The time complexity is the same O(n) as manual recursion because the library must visit every node. The advantage is expressiveness and readability. Use jp.value() instead of jp.query()[0] to get the first match — it short-circuits internally. For JSONPath groupBy and sorting of results, see our guides on JSON groupBy and JSON array sorting.

Fuzzy Search with fuse.js

fuse.js implements the Bitap algorithm for approximate string matching — it finds results even when the query contains typos, transpositions, or missing characters. The threshold is the single most important configuration parameter: 0.0 requires an exact match, 1.0 accepts any string, and 0.3 is the recommended starting point for typo tolerance. fuse.js runs entirely in the browser or Node.js with no network requests — add 24 KB to your bundle or run server-side to avoid client bundle impact.

// npm install fuse.js
import Fuse from 'fuse.js';

const books = [
  { id: 1, title: 'The Old Man and the Sea',       author: { name: 'Ernest Hemingway' }, year: 1952 },
  { id: 2, title: 'One Hundred Years of Solitude', author: { name: 'Gabriel García Márquez' }, year: 1967 },
  { id: 3, title: 'To Kill a Mockingbird',         author: { name: 'Harper Lee' }, year: 1960 },
  { id: 4, title: 'The Great Gatsby',              author: { name: 'F. Scott Fitzgerald' }, year: 1925 },
];

// ── Basic fuzzy search ────────────────────────────────────────────
const fuse = new Fuse(books, {
  keys: ['title', 'author.name'],  // dot notation for nested fields
  threshold: 0.3,                  // 0.0 = exact, 1.0 = anything, 0.3 = default
  includeScore: true,              // attach score (0 = perfect, 1 = mismatch)
  includeMatches: true,            // highlight matched substrings
  minMatchCharLength: 2,           // ignore single-character matches
  ignoreLocation: true,            // search whole field, not just the beginning
});

// Tolerates typo: "mokkingbird" instead of "mockingbird"
fuse.search('mokkingbird');
// [{ item: { id: 3, title: 'To Kill a Mockingbird' }, score: 0.08, matches: [...] }]

// Exact match returns score 0
fuse.search('Gatsby');
// [{ item: { id: 4, title: 'The Great Gatsby' }, score: 0 }]

// ── Weighted fields ───────────────────────────────────────────────
const weightedFuse = new Fuse(books, {
  keys: [
    { name: 'title',       weight: 2 },  // title matches score 2x
    { name: 'author.name', weight: 1 },
  ],
  threshold: 0.3,
  includeScore: true,
});

// ── threshold comparison ──────────────────────────────────────────
// threshold: 0.0  — exact match only; "hemmingway" does NOT match "Hemingway"
// threshold: 0.3  — tolerates ~3 edit-distance errors for typical words
// threshold: 0.6  — very permissive; "sea" matches "The Great Gatsby" (weak signal)
// threshold: 1.0  — everything matches everything

// ── Extended search syntax (useExtendedSearch: true) ──────────────
const extFuse = new Fuse(books, {
  keys: ['title', 'author.name'],
  useExtendedSearch: true,
  threshold: 0.3,
});

extFuse.search("'Hemingway");   // exact token — apostrophe prefix
extFuse.search('^The');         // prefix match — caret
extFuse.search('!Hemingway');   // inverse — exclamation mark
extFuse.search('^The !Sea');    // AND: starts with "The" AND not "Sea"

// ── Pre-built index for large datasets ────────────────────────────
// Build once (O(n)), reuse for every search (O(log n) after initialization)
const index = Fuse.createIndex(['title', 'author.name'], books);
const fuseWithIndex = new Fuse(books, { keys: ['title', 'author.name'], threshold: 0.3 }, index);

// Serialize index for static sites — avoid rebuild on every page load:
// fs.writeFileSync('fuse-index.json', JSON.stringify(index.toJSON()));
// const raw = JSON.parse(fs.readFileSync('fuse-index.json'));
// const loadedIndex = Fuse.parseIndex(raw);

// ── Server-side fuse.js (Node.js) — avoid 24 KB client bundle ─────
// Run fuse.js in a Node.js API handler instead of shipping to the browser:
// POST /api/search { query }
// → build fuse instance at module init → return results as JSON

fuse.js search time is O(n × m) where n is the number of items and m is the average string length — for 50,000 items it is perceptible at 100–200 ms per keystroke. For large datasets, pre-build the index with Fuse.createIndex() at startup (or at build time) and serialize with index.toJSON() — this skips the O(n) index construction on every page load. The ignoreLocation: true option is critical for document and description search — by default fuse.js weights matches at the beginning of a string more heavily, which is wrong when the match could be anywhere in a long text field. See the JSON flatten guide if you need to pre-flatten nested JSON before indexing.

Filtering JSON Arrays with Array Methods

Native array methods — filter(), find(), findIndex(), some(), every() — are the right tool for structured JSON arrays where the schema is known and consistent. All are O(n). filter() returns all matches; find() short-circuits on the first match and is faster when you need only one result. For multi-criteria filtering, compound predicates with && and ||; for dynamic criteria from user input, build the predicate programmatically from an object.

const products = [
  { id: 1, name: 'Wireless Keyboard',   brand: 'Logitech', price: 79.99,  inStock: true,  category: 'peripherals' },
  { id: 2, name: 'USB-C Hub',           brand: 'Anker',    price: 39.99,  inStock: true,  category: 'accessories' },
  { id: 3, name: 'Mechanical Keyboard', brand: 'Keychron', price: 129.99, inStock: false, category: 'peripherals' },
  { id: 4, name: 'Wireless Mouse',      brand: 'Logitech', price: 59.99,  inStock: true,  category: 'peripherals' },
];

// ── filter() — returns ALL matching elements ──────────────────────
const inStockLogitech = products.filter(p => p.inStock && p.brand === 'Logitech');
// [{ id: 1, name: 'Wireless Keyboard' }, { id: 4, name: 'Wireless Mouse' }]

// ── find() — first match, short-circuits (prefer over filter()[0]) ─
const usbc = products.find(p => p.id === 2);
// { id: 2, name: 'USB-C Hub', ... }

// ── Multi-field case-insensitive text search ─────────────────────
function textSearch(items, query) {
  const q = query.toLowerCase();
  return items.filter(p =>
    ['name', 'brand', 'category'].some(field =>
      p[field]?.toLowerCase().includes(q)
    )
  );
}

textSearch(products, 'wireless');
// [{ id: 1, name: 'Wireless Keyboard' }, { id: 4, name: 'Wireless Mouse' }]

// ── Range filter ─────────────────────────────────────────────────
const affordable = products.filter(p => p.price >= 30 && p.price <= 80 && p.inStock);
// [{ id: 1, price: 79.99 }, { id: 2, price: 39.99 }, { id: 4, price: 59.99 }]

// ── Dynamic criteria (build predicate from user-provided object) ──
function filterByCriteria(items, criteria) {
  return items.filter(item =>
    Object.entries(criteria).every(([key, value]) => {
      if (Array.isArray(value)) return value.includes(item[key]);  // enum match
      if (typeof value === 'string') return item[key]?.toLowerCase().includes(value.toLowerCase());
      return item[key] === value;
    })
  );
}

filterByCriteria(products, { brand: 'Logitech', inStock: true });
// [{ id: 1 }, { id: 4 }]

filterByCriteria(products, { category: ['peripherals', 'accessories'], inStock: true });
// [{ id: 1 }, { id: 2 }, { id: 4 }]

// ── Build an index for O(1) repeated lookups ─────────────────────
// For large datasets where you search by ID repeatedly
const indexById = new Map(products.map(p => [p.id, p]));
indexById.get(3);  // O(1) — { id: 3, name: 'Mechanical Keyboard' }

// Multi-value index: group by category
const indexByCategory = products.reduce((acc, p) => {
  (acc[p.category] ??= []).push(p);
  return acc;
}, {});
indexByCategory['peripherals'];
// [{ id: 1 }, { id: 3 }, { id: 4 }]  — O(1) lookup after O(n) build

For repeated searches on the same large dataset, building a Map index at startup converts O(n) lookups to O(1) — the O(n) build cost amortizes over all subsequent queries. Group-by indexes (using reduce) are especially powerful for category/status filters common in product and user list UIs. See the JSON groupBy guide for detailed grouping patterns and the JSON array sorting guide for ordering results after filtering.

Full-Text Search with Elasticsearch JSON Query DSL

Elasticsearch is the industry standard for full-text search at scale — it processes millions of documents in under 100 ms using inverted indexes where each unique term maps to the list of documents containing it. The Query DSL is a JSON structure sent as the request body. The bool query is the foundation: must clauses contribute to scoring (like AND), filter clauses are cached and do not score (fast exact matches), should adds optional conditions (like OR), and must_not excludes documents.

// npm install @elastic/elasticsearch
import { Client } from '@elastic/elasticsearch';
const client = new Client({ node: 'http://localhost:9200' });

// ── Step 1: Create index with explicit field mapping ───────────────
await client.indices.create({
  index: 'products',
  body: {
    mappings: {
      properties: {
        name:        { type: 'text',    analyzer: 'english' },  // full-text
        description: { type: 'text',    analyzer: 'english' },
        brand:       { type: 'keyword' },                       // exact match
        category:    { type: 'keyword' },
        price:       { type: 'float' },
        inStock:     { type: 'boolean' },
      },
    },
  },
});

// ── Step 2: Index a document ──────────────────────────────────────
await client.index({
  index: 'products',
  document: {
    name: 'Wireless Keyboard',
    description: 'Compact 65% layout with Bluetooth 5.0',
    brand: 'Logitech',
    category: 'peripherals',
    price: 79.99,
    inStock: true,
  },
});

// ── Step 3: Search with bool query ────────────────────────────────
const { hits } = await client.search({
  index: 'products',
  body: {
    query: {
      bool: {
        // must: full-text search, contributes to _score
        must: [{
          multi_match: {
            query: 'wireless keyboard',
            fields: ['name^2', 'description'],  // name boosted 2x
            fuzziness: 'AUTO',                   // tolerates typos
          },
        }],
        // filter: exact-match, cached, does not affect score
        filter: [
          { term:  { inStock: true } },
          { term:  { category: 'peripherals' } },
          { range: { price: { gte: 50, lte: 150 } } },
        ],
      },
    },
    sort: [{ _score: 'desc' }, { price: 'asc' }],
    size: 10,
    from: 0,  // pagination offset
  },
});

hits.hits.map(h => ({ id: h._id, ...h._source, score: h._score }));

// ── Fuzzy query for typo tolerance ───────────────────────────────
await client.search({
  index: 'products',
  body: {
    query: {
      match: {
        name: {
          query: 'keybord',          // typo: missing 'a'
          fuzziness: 'AUTO',         // AUTO: 0 edits for ≤2 chars, 1 for 3–5, 2 for 6+
          prefix_length: 1,          // first character must match exactly
        },
      },
    },
  },
});
// Matches "Keyboard" despite the typo

// ── Aggregation: facets alongside search results ──────────────────
await client.search({
  index: 'products',
  body: {
    query: { match: { name: 'keyboard' } },
    aggs: {
      by_brand:    { terms: { field: 'brand' } },
      price_stats: { stats: { field: 'price' } },
    },
  },
});
// Returns both hits AND aggregation data (brand counts, price min/max/avg)

Use filter clauses (not must) for boolean and exact-match conditions like inStock: true and category facets — filter clauses are cached by Elasticsearch and do not recompute per query, making them significantly faster than equivalent must term queries. The fuzziness: "AUTO" setting applies edit distance 0 for 1–2 character strings, 1 for 3–5, and 2 for 6+ characters — tuned to natural language typo rates. For structured JSON data already in a database, see the JSON performance guide for choosing between in-process search and Elasticsearch based on dataset size and query patterns.

Performance: Choosing the Right JSON Search Strategy

The right search strategy depends on three factors: dataset size, search type (exact vs fuzzy vs full-text), and whether the search runs in the browser or on a server. Each strategy has a different O-complexity profile and infrastructure requirement. Choosing the wrong strategy costs either performance or unnecessary infrastructure complexity.

// ── Strategy comparison ────────────────────────────────────────────────────────
//
// Strategy              │ Best for                  │ Time         │ Setup
// ──────────────────────┼───────────────────────────┼──────────────┼──────────────
// JSON.stringify hack   │ quick string exist check  │ O(n), fast   │ zero
// Array.filter()        │ structured, known schema  │ O(n)         │ zero
// Map index             │ repeated exact lookups    │ O(1) lookup  │ O(n) build
// JSONPath              │ complex path queries      │ O(n)         │ npm install
// fuse.js               │ typo-tolerant, in-memory  │ O(n×m)       │ 24 KB bundle
// Elasticsearch         │ millions of docs, FT      │ O(log n)     │ separate svc
//
// ── When Array.filter() is sufficient ────────────────────────────
// Items: up to ~100,000
// Search type: exact match or simple string contains
// Latency budget: 10–50 ms per search is acceptable
// No fuzzy match needed

// ── When fuse.js is the right choice ─────────────────────────────
// Items: up to ~50,000 in memory
// Need typo tolerance (autocomplete, search-as-you-type)
// Browser or lightweight Node.js API
// Cannot afford separate search infrastructure

// ── When Elasticsearch is necessary ─────────────────────────────
// Items: millions+
// Need sub-100 ms at scale
// Multi-language analyzers, synonyms, custom scoring
// Complex aggregations (facets, stats) alongside search

// ── Benchmark: search 10,000 products for "keyboard" ─────────────
// JSON.stringify().includes()  → ~1 ms     (simple string, fast C++)
// Array.filter() + includes()  → ~3 ms     (JavaScript loop + string ops)
// fuse.js (no pre-built index) → ~45 ms    (Bitap algorithm, no index)
// fuse.js (pre-built index)    → ~8 ms     (index reduces search space)
// Map.get() by exact key       → <0.1 ms   (hash lookup, O(1))
// Elasticsearch (local)        → ~5-15 ms  (network + inverted index)

// ── Pre-building fuse.js index at Next.js build time ─────────────
// pages/api/search.ts (Next.js API route)
import Fuse from 'fuse.js';
import products from '@/data/products.json';

const KEYS = ['name', 'brand', 'description'];
const index = Fuse.createIndex(KEYS, products);
// Singleton: built once at module load, reused across requests

const fuse = new Fuse(products, {
  keys: KEYS,
  threshold: 0.3,
  includeScore: true,
}, index);

export default function handler(req, res) {
  const { q } = req.query;
  if (!q) return res.json([]);
  const results = fuse.search(q, { limit: 20 });
  res.json(results.map(r => ({ ...r.item, score: r.score })));
}

The pre-built fuse.js index pattern (singleton at module load) is the recommended approach for Next.js API routes — the index is constructed once during the cold start and reused for all subsequent requests, reducing per-request latency from ~45 ms to ~8 ms for 10,000 items. For even better performance, move to an Edge Function with the index pre-serialized as a static asset loaded at deploy time. Elasticsearch is worth the infrastructure complexity only when the dataset genuinely exceeds what fuse.js handles acceptably — for most SaaS products, the breakpoint is around 500,000 documents or when multi-language stemming and synonym search are required.

Indexing JSON for Fast Repeated Searches

Building an in-memory index converts O(n) linear scans to O(1) lookups for exact-match searches. The O(n) build cost amortizes across all subsequent queries — if you search the same dataset more than once, an index pays off immediately. For full-text and fuzzy search, the index is the inverted index (each term maps to the documents containing it), which is what fuse.js, Lunr.js, and Elasticsearch build internally.

const products = [
  { id: 'p1', name: 'Wireless Keyboard', brand: 'Logitech', category: 'peripherals', price: 79.99 },
  { id: 'p2', name: 'USB-C Hub',         brand: 'Anker',    category: 'accessories', price: 39.99 },
  { id: 'p3', name: 'Wireless Mouse',    brand: 'Logitech', category: 'peripherals', price: 59.99 },
  { id: 'p4', name: 'Mechanical Switch', brand: 'Cherry',   category: 'peripherals', price: 19.99 },
];

// ── 1. Simple Map index: O(1) lookup by unique key ────────────────
const byId = new Map(products.map(p => [p.id, p]));
byId.get('p2');  // { id: 'p2', name: 'USB-C Hub', ... }  — O(1)

// ── 2. Group index: O(1) lookup by non-unique field ───────────────
const byBrand = products.reduce((acc, p) => {
  (acc[p.brand] ??= []).push(p);
  return acc;
}, {});
byBrand['Logitech'];  // [{ id: 'p1' }, { id: 'p3' }]  — O(1)

// ── 3. Inverted index: term → documents (manual FT index) ─────────
function buildInvertedIndex(docs, fields) {
  const index = new Map();  // term → Set of doc ids
  for (const doc of docs) {
    for (const field of fields) {
      const value = doc[field];
      if (typeof value !== 'string') continue;
      // Tokenize: lowercase + split on non-word characters
      const terms = value.toLowerCase().split(/\W+/).filter(t => t.length > 1);
      for (const term of terms) {
        if (!index.has(term)) index.set(term, new Set());
        index.get(term).add(doc.id);
      }
    }
  }
  return index;
}

const invertedIndex = buildInvertedIndex(products, ['name', 'brand']);
const idIndex = new Map(products.map(p => [p.id, p]));

function searchIndex(invertedIndex, idIndex, query) {
  const terms = query.toLowerCase().split(/\W+/).filter(t => t.length > 1);
  if (terms.length === 0) return [];

  // AND semantics: result = intersection of all term matches
  const sets = terms.map(t => invertedIndex.get(t) ?? new Set());
  const resultIds = sets.reduce((a, b) => new Set([...a].filter(id => b.has(id))));

  return [...resultIds].map(id => idIndex.get(id));
}

searchIndex(invertedIndex, idIndex, 'wireless');
// [{ id: 'p1', name: 'Wireless Keyboard' }, { id: 'p3', name: 'Wireless Mouse' }]

searchIndex(invertedIndex, idIndex, 'logitech wireless');
// [{ id: 'p1' }, { id: 'p3' }]  — intersection of 'logitech' and 'wireless'

// ── 4. Composite index: multi-field exact match ───────────────────
function buildCompositeIndex(docs, ...fields) {
  const index = new Map();
  for (const doc of docs) {
    const key = fields.map(f => String(doc[f] ?? '')).join('|');
    (index.get(key) ?? index.set(key, []).get(key)).push(doc);
  }
  return index;
}

const brandCategoryIndex = buildCompositeIndex(products, 'brand', 'category');
brandCategoryIndex.get('Logitech|peripherals');
// [{ id: 'p1', name: 'Wireless Keyboard' }, { id: 'p3', name: 'Wireless Mouse' }]

The manual inverted index above demonstrates what fuse.js and Elasticsearch build at a more sophisticated level — the core data structure is always a term-to-document mapping. For production use, rely on fuse.js or Lunr.js rather than implementing inverted indexes manually, since they handle stemming, stopword removal, and scoring. The composite index pattern is particularly useful for multi-field exact-match filters (brand + category + status) that would otherwise require scanning the full array on every user interaction. Combine a composite index with fuse.js fuzzy search: use the composite index to pre-filter by exact criteria, then run fuse.js only on the filtered subset for O(k × m) fuzzy search where k is the filtered count, not n. For more on JSON data optimization, see the JSON performance guide.

Key Terms

JSONPath
A query language for JSON, analogous to XPath for XML. The root node is $. Child access uses dot notation ($.store.name) or bracket notation ($['store']['name']). The wildcard [*] selects all elements of an array. Array slicing uses Python-style syntax: [0:2] selects indices 0 and 1. Filter expressions [?(@.price < 10)] apply predicates to array elements where @ refers to the current element. JSONPath is implemented by the jsonpath and jsonpath-plus npm packages in JavaScript. Time complexity is O(n) — the library must visit every node to evaluate recursive descent and filter expressions.
recursive descent
In JSONPath, the .. operator (double dot) that traverses a JSON tree and collects all nodes matching the subsequent selector at any depth. $..price finds every price property regardless of nesting level — equivalent to a recursive JavaScript function that walks the entire tree. Recursive descent is O(n) where n is the total number of nodes. It is the most powerful JSONPath operator for searching unknown or variable-depth structures, but also the most expensive because it cannot short-circuit — it must visit every node. Combine with filter expressions for precise selection: $..[?(@.inStock == true)] finds all in-stock items at any depth.
fuse.js
An open-source JavaScript library for fuzzy (approximate) string searching on in-memory arrays. It uses the Bitap algorithm to find approximate matches within a configurable edit distance, making it tolerant of typos, transpositions, and missing characters. Key configuration options: keys (field names or dot-notation nested paths to search), threshold (0.0–1.0 fuzziness control; 0.3 is recommended), includeScore (attach match quality score to results, 0 = perfect match, 1 = complete mismatch), and ignoreLocation (search the entire field value, not just the beginning). Adds 24 KB to the client bundle. Suitable for datasets up to approximately 50,000 items before search latency becomes perceptible without a pre-built index.
fuzzy search
A search technique that finds results that approximately match the query, rather than requiring an exact string match. Approximate matching is measured by edit distance (also called Levenshtein distance) — the minimum number of single-character insertions, deletions, or substitutions required to transform one string into another. "hemmingway" has edit distance 1 from "Hemingway" (one character substitution). fuse.js uses the Bitap algorithm for fuzzy matching. Elasticsearch uses its own implementation with the fuzziness parameter on match queries, where "AUTO" selects edit distance 0, 1, or 2 based on term length. Fuzzy search is essential for autocomplete and search-as-you-type UIs where users make typographic errors.
Elasticsearch Query DSL
The JSON-based query language used by Elasticsearch to define search requests. Queries are sent as the request body to the search API. The bool query is the primary composition operator: must clauses are required and contribute to relevance scoring (BM25); filter clauses are required but cached and do not affect scoring (faster for exact-match conditions); should clauses are optional and boost scores when matched; must_not clauses exclude matching documents. match queries perform full-text search on text fields; term queries perform exact matching on keyword fields. The Query DSL also supports aggregations (facets, statistics) returned alongside search hits.
inverted index
A data structure that maps each unique term (word) to the list of documents containing that term. It is the core data structure behind all full-text search engines including Elasticsearch, Lucene, and Lunr.js. Construction is O(n × t) where n is the number of documents and t is the average number of terms per document. Lookup is O(k) where k is the number of documents matching the term — typically much smaller than the total document count. Inverted indexes enable sub-100 ms search across millions of documents by avoiding full collection scans. Elasticsearch maintains inverted indexes per shard and uses them for both match (full-text) and aggregation queries.
edit distance
Also called Levenshtein distance — the minimum number of single-character operations (insertions, deletions, or substitutions) required to transform one string into another. "kitten" to "sitting" has edit distance 3 (substitute k→s, substitute e→i, insert g). Edit distance is the mathematical foundation of fuzzy search: two strings with edit distance 1 are one typo apart. fuse.js uses the Bitap algorithm to compute approximate edit distance efficiently. Elasticsearch's fuzziness: "AUTO" setting maps string length to maximum edit distance: strings of 1–2 characters allow 0 edits, 3–5 allow 1, and 6+ allow 2. Lower edit distance tolerance = stricter matching; higher = more typo-permissive results.

FAQ

How do I search for a value in a deeply nested JSON object?

Write a recursive function that handles three cases: if the current node equals the target, return it; if the node is an array, recurse into each element; if the node is a plain object, recurse into each value. Example: function deepFind(obj, target) { if (obj === target) return true; if (Array.isArray(obj)) return obj.some(i => deepFind(i, target)); if (obj !== null && typeof obj === "object") return Object.values(obj).some(v => deepFind(v, target)); return false; }. This is O(n) where n = total values in the tree. For a quick string existence check without caring about false positives on key names, JSON.stringify(obj).includes("term") is 2–5x faster because V8's string search is optimized native code. For finding all matches with their paths, use a path-accumulating version that collects results instead of returning on first match.

What is JSONPath and how do I use it to search JSON?

JSONPath is a query language for JSON analogous to XPath for XML. Install with npm install jsonpath. The root is $. Key syntax: $.key (child access), $..key (recursive descent — finds at any depth), $[*] (all array elements), $[0] (index), $[0:2] (slice), $[?(@.price < 10)] (filter expression). In JavaScript: const jp = require('jsonpath'); jp.query(data, '$.books[?(@.price < 10)]'). The recursive descent $..key is the most powerful search tool — it finds all matching properties at any depth without writing traversal code. For regex filters and additional operators, use jsonpath-plus instead.

How do I implement fuzzy search on JSON data?

Use fuse.js: npm install fuse.js. Initialize with const fuse = new Fuse(items, { keys: ["name", "description"], threshold: 0.3, includeScore: true }), then call fuse.search("querry") — it matches "query" despite the typo. The threshold is the critical tuning parameter: 0.0 = exact match only, 0.3 = balanced typo tolerance (recommended), 1.0 = match anything. For nested fields use dot notation in keys: { keys: ["author.name"] }. fuse.js adds 24 KB to the client bundle — run it server-side in a Node.js API handler to avoid bundle size impact. For datasets beyond 50,000 items, switch to Elasticsearch with "fuzziness": "AUTO" on a match query.

How do I filter a JSON array by multiple criteria?

Use Array.filter() with a compound predicate. For AND logic: items.filter(p => p.active && p.score >= 80 && p.role === "admin"). For OR logic: items.filter(p => p.name.includes(q) || p.description.includes(q)). For dynamic criteria from user input, build the predicate from an object: function filterByCriteria(items, criteria) { return items.filter(item => Object.entries(criteria).every(([k, v]) => item[k] === v)); }. For enum-style multi-value filters (category is "peripherals" or "accessories"): items.filter(p => ["peripherals","accessories"].includes(p.category)). For repeated exact-match lookups on large datasets, build a Map or group index first (O(n) once) and get O(1) per lookup thereafter.

How does Elasticsearch search JSON data?

Elasticsearch indexes JSON documents and searches them via the Query DSL — a JSON request body. The match query performs full-text search on a field: { "query": { "match": { "name": "keyboard" } } }. multi_match searches across multiple fields: { "query": { "multi_match": { "query": "keyboard", "fields": ["name^2", "description"] } } } (name boosted 2x). The bool query combines conditions: must (AND, affects score), filter (AND, cached, no score — use for term, range, and boolean exact-match conditions), should (OR), must_not (NOT). Elasticsearch uses inverted indexes, enabling sub-100 ms search across millions of documents. Add "fuzziness": "AUTO" to any match query to enable typo tolerance.

What is the fastest way to search a large JSON file?

For simple string containment, JSON.stringify(obj).includes("term") is 2–5x faster than recursive JavaScript traversal — V8's native string search is highly optimized. For repeated exact-match lookups on the same dataset, build a Map index at startup: O(n) once, then O(1) per lookup. For repeated fuzzy or full-text searches on up to 50,000 items, use fuse.js with a pre-built index (Fuse.createIndex()) — reduces per-search cost from O(n×m) to O(k) after index construction. For files too large to fit in memory, use a streaming JSON parser (stream-json in Node.js) to process the file without loading it entirely. For millions of documents and sub-100 ms latency requirements, Elasticsearch is the correct tool — it maintains inverted indexes on disk and shards data across nodes.

How do I search JSON by key name at any depth?

Use the JSONPath recursive descent operator: jp.query(data, '$..price') finds all price properties at any depth — O(n). For a pure JavaScript zero-dependency approach: function findByKey(obj, key, results = []) { if (obj !== null && typeof obj === "object") { for (const [k, v] of Object.entries(obj)) { if (k === key) results.push(v); if (typeof v === "object") findByKey(v, key, results); } } return results; }. For partial key name matches (key contains a substring): change the condition to k.includes(targetKey). For case-insensitive key search: k.toLowerCase() === targetKey.toLowerCase(). Both the JSONPath approach and the recursive function have identical O(n) time complexity — choose JSONPath for readability, the recursive function for zero-dependency environments.

How do I implement full-text search on JSON data in Node.js?

Four options in order of increasing capability: (1) Simple string matchArray.filter() with String.includes(); zero setup, O(n) per search, no fuzzy. (2) fuse.jsnpm install fuse.js; same API in Node.js as browser; supports fuzzy matching, weighted fields, nested paths; build a singleton Fuse instance at module load to avoid per-request index reconstruction. (3) Lunr.js — build an inverted index at startup; supports TF-IDF scoring, wildcards (keyword*), and field boosts; serialize with JSON.stringify(idx) and reload with lunr.Index.load(). (4) Elasticsearch — run as a separate service; scales to hundreds of millions of documents; full Query DSL, aggregations, multi-language analyzers. For a Node.js API with under 100,000 records, fuse.js with a pre-built index is the sweet spot — zero infrastructure, ~8 ms search after initialization.

Further reading and primary sources

  • fuse.js Documentationfuse.js options reference, extended search syntax, weighted keys, pre-built index API, and performance guidance
  • jsonpath npm packageJSONPath npm package for Node.js — query, paths, nodes, and value API with filter expressions
  • jsonpath-plus on npmJSONPath Plus — extended JSONPath with regex filters, parent axis, and resultType options
  • Elasticsearch Query DSLOfficial Elasticsearch documentation for bool, match, multi_match, term, range, and fuzzy queries
  • MDN: Array.prototype.filter()MDN reference for Array.filter() including predicate function signature, return value, and examples