Elasticsearch JSON Query DSL: match, bool, aggs & Mappings

Q: How do I write a basic Elasticsearch JSON query?

A basic Elasticsearch JSON query is sent as a POST request body to the _search endpoint. The minimal structure is: { "query": { "match_all": {} } } — which returns all documents. For a full-text search on a specific field, use match: { "query": { "match": { "title": "elasticsearch" } } }. For exact-match lookups on keyword fields, use term: { "query": { "term": { "status.keyword": "published" } } }. Set the size parameter (default 10, max 10000) to control how many results are returned, and from for pagination offset. Use _source to control which fields appear in the response. The query goes in the request body: POST /my-index/_search with Content-Type: application/json.

Q: What is the difference between match and term queries in Elasticsearch?

match is a full-text query that analyzes the input string before searching — it tokenizes the text, applies the same analyzer used at index time (lowercasing, stemming, stop words), and searches the inverted index for matching tokens. Use match on text fields. term is an exact-match query that searches for a precise, unanalyzed value — no tokenization or analysis is applied. Use term on keyword fields (or numeric, date, boolean fields). A critical rule: never run a term query on a text field, because the field is analyzed at index time (stored as lowercase tokens) but term does not analyze the query value, so term: { "title": "Elasticsearch" } would never match a document where title was stored as "elasticsearch" (lowercased). To exact-match text, use the .keyword sub-field (e.g., title.keyword).

Q: How do I filter Elasticsearch results without affecting the relevance score?

Use a bool query with a filter clause instead of must. Queries in filter context do not compute a relevance score — they simply include or exclude documents — so they are 2-5x faster than must for exact-match conditions, and their results are cached in the filter cache. The structure is: { "query": { "bool": { "must": [{ "match": { "title": "elasticsearch" } }], "filter": [{ "term": { "status.keyword": "published" } }, { "range": { "date": { "gte": "2025-01-01" } } }] } } }. The must clause computes relevance scores for full-text matching, while the filter clauses narrow the result set without scoring. Use must_not for exclusion filters (also cached, no scoring). Use should with minimum_should_match for optional conditions that boost score when matched.

Q: How does the Elasticsearch bulk API NDJSON format work?

The Elasticsearch bulk API accepts NDJSON (Newline Delimited JSON) — two lines per operation. Line 1 is the action metadata JSON: { "index": { "_index": "my-index", "_id": "1" } }. Line 2 is the document JSON: { "title": "Hello", "date": "2026-05-19" }. The body must end with a trailing newline. Valid action types are index (create or replace), create (fail if document exists), update (partial update or upsert), and delete (delete a document — no line 2 needed). The response JSON contains a top-level errors boolean and an items array with one entry per operation, each containing the action type, _index, _id, result (created/updated/deleted), status code, and an error object if the operation failed. Always check both the top-level errors flag and each item's error field — the bulk API returns HTTP 200 even when some operations fail.

Q: What are Elasticsearch aggregations and how do I write them in JSON?

Aggregations compute analytics over a set of documents. The JSON structure uses an aggs (or aggregations) key at the same level as query, with named aggregation objects. Each aggregation has a name you define, a type (e.g., terms, date_histogram, avg), and type-specific options. Example: { "aggs": { "by_status": { "terms": { "field": "status.keyword", "size": 10 } }, "avg_price": { "avg": { "field": "price" } } } }. Bucket aggregations (terms, date_histogram, range) group documents into buckets; metric aggregations (avg, sum, min, max, cardinality) compute a single value. Nest sub-aggregations inside a bucket aggregation using an inner aggs key: { "aggs": { "by_month": { "date_histogram": { "field": "date", "calendar_interval": "month" }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }.

Q: How do I define Elasticsearch index mappings in JSON?

Index mappings are defined in the mappings.properties JSON object when creating an index with PUT /my-index. Each field has a type and optional configuration. Example: { "mappings": { "properties": { "title": { "type": "text", "analyzer": "english" }, "status": { "type": "keyword" }, "price": { "type": "double" }, "date": { "type": "date", "format": "yyyy-MM-dd" }, "tags": { "type": "keyword" }, "author": { "type": "object", "properties": { "name": { "type": "text" }, "id": { "type": "keyword" } } } } } }. Key field types: text (analyzed full-text), keyword (exact-match), long/integer/double (numbers), date (dates with configurable format), boolean, object (nested JSON object), nested (array of objects with independent field matching), and geo_point (latitude/longitude). Mappings are immutable after index creation — adding new fields is allowed but changing an existing field type requires creating a new index and reindexing with the reindex API.

Q: How do I reduce the size of Elasticsearch JSON responses?

Use _source filtering to include only the fields you need: { "_source": ["title", "price", "date"], "query": { "match_all": {} } }. This reduces response JSON size by 60-90% for documents with many fields. Alternatively, use _source: false to exclude the stored source entirely and use the fields parameter to retrieve specific field values from doc values instead. Use size to limit the number of hits returned (default 10). For aggregation-only requests where you do not need individual documents, set size: 0 to suppress the hits array entirely, which significantly speeds up the response. Use the filter_path parameter to prune the response envelope: ?filter_path=hits.hits._source,hits.total.value. For pagination of large result sets, use search_after instead of deep from/size offsets — deep pagination with large from values is expensive because Elasticsearch must rank and discard all preceding documents.

Q: How do I use the Node.js Elasticsearch client to search with JSON?

Install the official client: npm install @elastic/elasticsearch. Create a client instance: import { Client } from "@elastic/elasticsearch"; const client = new Client({ node: "http://localhost:9200" }). Run a search: const result = await client.search({ index: "my-index", query: { match: { title: "elasticsearch" } }, size: 20 }). The client accepts the query DSL JSON directly as the request body — no wrapping needed. Access results via result.hits.hits (array of hit objects) and result.hits.total.value (total count). For bulk indexing, use the client.helpers.bulk() helper which accepts an array of documents and handles the NDJSON formatting automatically: await client.helpers.bulk({ datasource: documents, onDocument: (doc) => ({ index: { _index: "my-index" } }) }). Handle ResponseError for Elasticsearch-level errors (4xx/5xx) and check the error.meta.body for the error JSON.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 19, 2026

Elasticsearch Query DSL expresses all search operations as JSON — a match query, a bool filter combining 10 clauses, and a date_histogram aggregation are each pure JSON objects sent as the request body to the _search API endpoint. A bool query combining must, filter, should, and must_not clauses reduces to a single JSON object; using filter instead of mustfor exact-match conditions improves query performance by 2-5× because filter results are cached. This guide covers Elasticsearch index mappings JSON, Query DSL (match, term, bool, range, multi_match), aggregations (terms, date_histogram, avg), bulk API NDJSON format, source filtering, and Node.js client integration.

Index Mappings JSON: Field Types and Dynamic Mapping

Index mappings define the schema for an Elasticsearch index — every field's type, analyzer, and indexing behavior. Mappings are declared as a JSON object under the mappings.properties key when creating an index. Elasticsearch has two mapping modes: explicit (you define every field before indexing) and dynamic (Elasticsearch infers field types from the first document that contains each field). Dynamic mapping is convenient for development but can cause problems in production — a field that first appears as a number is mapped as long, and later sending a string value for the same field will fail.

// Create index with explicit mappings
// PUT /products
{
  "mappings": {
    "properties": {
      "title":       { "type": "text", "analyzer": "english" },
      "slug":        { "type": "keyword" },
      "description": { "type": "text" },
      "price":       { "type": "double" },
      "stock":       { "type": "integer" },
      "published":   { "type": "boolean" },
      "created_at":  { "type": "date", "format": "strict_date_optional_time" },
      "tags":        { "type": "keyword" },
      "category": {
        "type": "object",
        "properties": {
          "id":   { "type": "keyword" },
          "name": { "type": "text" }
        }
      },
      "variants": {
        "type": "nested",
        "properties": {
          "sku":   { "type": "keyword" },
          "size":  { "type": "keyword" },
          "price": { "type": "double" }
        }
      },
      "location": { "type": "geo_point" }
    }
  },
  "settings": {
    "number_of_shards":   1,
    "number_of_replicas": 1
  }
}

// Common field types:
// text     — analyzed full-text (use for search)
// keyword  — exact-match, sortable, aggregatable (use for filters/facets)
// long     — 64-bit integer
// double   — 64-bit float
// date     — date/datetime with configurable format
// boolean  — true/false
// object   — nested JSON object (fields queryable as product.category.id)
// nested   — array of objects with independent field matching
// geo_point — { lat, lon } or "lat,lon" string

// text fields get a free .keyword sub-field by default (for exact-match):
// "title" → full-text search
// "title.keyword" → exact-match, aggregations, sorting

// Disable dynamic mapping to reject unmapped fields:
// PUT /strict-index
{
  "mappings": {
    "dynamic": "strict",
    "properties": { ... }
  }
}

// Add a new field to an existing index (allowed — adding is fine):
// PUT /products/_mapping
{
  "properties": {
    "weight_kg": { "type": "double" }
  }
}

// Change a field type (NOT allowed — requires reindex):
// 1. Create new index with updated mappings
// 2. POST /_reindex
{
  "source": { "index": "products" },
  "dest":   { "index": "products-v2" }
}
// 3. Update aliases to point to new index
// POST /_aliases
{
  "actions": [
    { "remove": { "index": "products",    "alias": "products_alias" } },
    { "add":    { "index": "products-v2", "alias": "products_alias" } }
  ]
}

The nested type is critical when you have an array of objects and need to query individual objects independently. With object type, Elasticsearch flattens the array into parallel arrays per field — querying variants.sku: "ABC" AND variants.price < 50 on an object field can match a document where one variant has sku: "ABC" and a different variant has price < 50. The nested type preserves object boundaries, ensuring cross-field conditions apply to the same object in the array. Use aliases as an abstraction layer over index names to enable zero-downtime reindexing.

Query DSL JSON: match, term, range, multi_match

Elasticsearch Query DSL queries are JSON objects sent as the query key in the request body to POST /index/_search. Every query is a leaf (operates on fields directly) or a compound (combines other queries). The choice of query type determines whether the input is analyzed before searching and whether a relevance score is computed. Understanding this distinction prevents the most common Elasticsearch bug: running a term query on a text field.

// POST /products/_search

// ── match — full-text search (analyzed) ───────────────────────
// Analyzer tokenizes "running shoes" → ["running", "shoes"]
// Matches documents containing either token (default: OR)
{
  "query": {
    "match": {
      "title": {
        "query":     "running shoes",
        "operator":  "and",        // require ALL tokens (stricter)
        "fuzziness": "AUTO"        // typo tolerance: "runing" matches "running"
      }
    }
  }
}

// ── term — exact-match (NOT analyzed) ─────────────────────────
// Use on keyword, numeric, date, boolean fields
// NEVER use on text fields (text is lowercased at index time)
{
  "query": {
    "term": { "slug": { "value": "running-shoes-v2" } }
  }
}

// ── terms — exact-match on a list of values (IN) ──────────────
{
  "query": {
    "terms": { "status": ["published", "featured"] }
  }
}

// ── range — numeric and date range queries ─────────────────────
{
  "query": {
    "range": {
      "price": { "gte": 20, "lte": 100 }
    }
  }
}

// Date range with format hint
{
  "query": {
    "range": {
      "created_at": {
        "gte":    "2025-01-01",
        "lte":    "2025-12-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

// ── multi_match — match across multiple fields ─────────────────
{
  "query": {
    "multi_match": {
      "query":  "wireless headphones",
      "fields": ["title^3", "description", "tags"],  // ^3 boosts title weight
      "type":   "best_fields"  // score = best single-field match score
    }
  }
}
// type options:
// best_fields   — score from best-matching field (default)
// most_fields   — sum scores from all matching fields
// cross_fields  — treats all fields as one big field (good for names)
// phrase        — match_phrase on all fields

// ── match_phrase — exact phrase in order ──────────────────────
{
  "query": {
    "match_phrase": {
      "description": {
        "query": "noise cancelling",
        "slop":  1   // allow 1 word between "noise" and "cancelling"
      }
    }
  }
}

// ── ids — retrieve specific documents by _id ──────────────────
{
  "query": {
    "ids": { "values": ["1", "2", "3"] }
  }
}

// ── exists — documents where a field is not null ──────────────
{
  "query": {
    "exists": { "field": "published_at" }
  }
}

// ── wildcard — pattern matching (expensive, avoid on large indexes) ─
{
  "query": {
    "wildcard": { "slug": { "value": "running-*" } }
  }
}

The fuzziness: "AUTO" setting is a practical default for user-facing search boxes — it automatically applies Levenshtein distance tolerance based on term length (0 for 1-2 chars, 1 for 3-5 chars, 2 for 6+ chars). The multi_match field boost syntax ("title^3") multiplies that field's score contribution by 3, making title matches rank higher than description matches. Use match_all: {} to return all documents (useful for aggregation-only requests where you do not want to filter the document set).

bool Query: must, filter, should, must_not

The bool query is the primary compound query for combining multiple conditions. Its four clauses serve distinct purposes: must (document must match, contributes to score), filter (document must match, no scoring, cached), should (document should match — boosts score if matched, optional unless no must/filter), and must_not (document must not match, no scoring, cached). Understanding the scoring vs. caching distinction is critical for performance — move all non-full-text conditions from must to filter.

// POST /products/_search

// ── Full bool query ────────────────────────────────────────────
{
  "query": {
    "bool": {
      // must: scored, documents MUST match all
      "must": [
        { "match": { "title": "wireless headphones" } }
      ],
      // filter: NOT scored, cached, documents MUST match all
      // 2-5x faster than must for exact-match conditions
      "filter": [
        { "term":  { "status": "published" } },
        { "range": { "price": { "gte": 20, "lte": 200 } } },
        { "term":  { "tags": "electronics" } }
      ],
      // should: boosts score if matched, optional
      "should": [
        { "term": { "featured": true } },
        { "match": { "brand": "sony" } }
      ],
      "minimum_should_match": 1,  // at least 1 should clause must match
      // must_not: documents MUST NOT match, cached
      "must_not": [
        { "term": { "discontinued": true } }
      ]
    }
  }
}

// ── Filter-only query (no relevance scoring) ───────────────────
// Set size: 0 for aggregation-only requests
{
  "query": {
    "bool": {
      "filter": [
        { "term":  { "category.id": "cat-123" } },
        { "range": { "stock": { "gt": 0 } } }
      ]
    }
  }
}

// ── Nested bool — complex compound conditions ──────────────────
// (status=published AND price<100) OR (status=featured)
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              { "term":  { "status": "published" } },
              { "range": { "price": { "lt": 100 } } }
            ]
          }
        },
        { "term": { "status": "featured" } }
      ],
      "minimum_should_match": 1
    }
  }
}

// ── constant_score — fixed score for filter (no TF/IDF) ────────
// Use when relevance score doesn't matter but you still want a score
{
  "query": {
    "constant_score": {
      "filter": { "term": { "status": "published" } },
      "boost": 1.0
    }
  }
}

// ── explain API — debug why a document scored as it did ────────
// GET /products/_explain/doc-id
{
  "query": {
    "bool": {
      "must": [{ "match": { "title": "headphones" } }],
      "filter": [{ "term": { "status": "published" } }]
    }
  }
}
// Returns: { "matched": true, "explanation": { "value": 1.23, "description": "...", "details": [...] } }

A common bool query mistake: putting all conditions in must when only the full-text match should affect scoring. The pattern is: full-text queries (match, multi_match, match_phrase) go in must; all exact-match conditions (term, terms, range, exists) go in filter. This separation maximizes cache utilization — filter results are cached in the node-level filter cache and reused across queries. The minimum_should_match parameter (integer or percentage string like "75%") controls how many should clauses must match when must/filter clauses are absent.

Aggregations JSON: terms, date_histogram, avg, nested

Aggregations compute analytics over the set of documents matching the query. The aggs key sits at the same level as query in the request body. Each aggregation is named by you (the name appears in the response), and can be either a bucket aggregation (groups documents into buckets) or a metric aggregation (computes a single numeric value per bucket). Sub-aggregations can be nested inside bucket aggregations, enabling faceted analytics with drill-down metrics.

// POST /products/_search
// Aggregation-only request: set size:0 to skip hits array
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [{ "term": { "status": "published" } }]
    }
  },
  "aggs": {

    // ── terms — faceted bucketing (like GROUP BY) ──────────────
    "by_category": {
      "terms": {
        "field": "category.id",
        "size":  10,              // top 10 categories by doc count
        "order": { "_count": "desc" }
      },
      // Sub-aggregation: avg price per category
      "aggs": {
        "avg_price": { "avg": { "field": "price" } },
        "min_price": { "min": { "field": "price" } },
        "max_price": { "max": { "field": "price" } }
      }
    },

    // ── date_histogram — time-series bucketing ─────────────────
    "sales_over_time": {
      "date_histogram": {
        "field":             "created_at",
        "calendar_interval": "month",   // or "week", "day", "year"
        "format":            "yyyy-MM",
        "min_doc_count":     0          // include empty buckets
      },
      "aggs": {
        "revenue": { "sum": { "field": "price" } }
      }
    },

    // ── fixed_interval for precise time windows ────────────────
    "hourly_events": {
      "date_histogram": {
        "field":          "timestamp",
        "fixed_interval": "1h"   // exactly 1 hour (not calendar-aware)
      }
    },

    // ── range — custom numeric buckets ────────────────────────
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 25 },
          { "from": 25, "to": 100 },
          { "from": 100 }
        ]
      }
    },

    // ── cardinality — distinct count (HyperLogLog++) ───────────
    "unique_brands": {
      "cardinality": {
        "field":             "brand.keyword",
        "precision_threshold": 100   // accuracy vs memory trade-off
      }
    },

    // ── stats — min, max, avg, sum, count in one go ────────────
    "price_stats": {
      "stats": { "field": "price" }
    },

    // ── nested — aggregate inside nested objects ───────────────
    "variant_prices": {
      "nested": { "path": "variants" },
      "aggs": {
        "avg_variant_price": { "avg": { "field": "variants.price" } }
      }
    }
  }
}

// Response structure:
{
  "hits":  { "total": { "value": 1500, "relation": "eq" }, "hits": [] },
  "aggregations": {
    "by_category": {
      "buckets": [
        {
          "key":       "cat-electronics",
          "doc_count": 320,
          "avg_price": { "value": 89.50 },
          "min_price": { "value": 9.99 },
          "max_price": { "value": 599.00 }
        }
      ]
    },
    "unique_brands": { "value": 47 },
    "price_stats": {
      "count": 1500, "min": 4.99, "max": 1299.00,
      "avg": 67.43, "sum": 101145.00
    }
  }
}

The cardinality aggregation uses the HyperLogLog++ algorithm to count distinct values approximately — it is much more memory-efficient than exact counting, with error rates below 5% at default precision. The precision_threshold option (default 3000) trades memory for accuracy. For nested aggregations inside a nested type field, you must first use a nested aggregation to enter the nested context before running sub-aggregations on nested fields. Running a terms aggregation directly on a nested field without the nested wrapper will return incorrect results.

Bulk API NDJSON Format

The Elasticsearch bulk API accepts NDJSON (Newline Delimited JSON) in the request body — each operation is two consecutive lines: the action metadata JSON object followed by the document JSON object. The endpoint is POST /_bulk (for cross-index operations) or POST /index/_bulk (scoped to one index). The Content-Type must be application/x-ndjson. The body must end with a trailing newline. Optimal batch size is 5-15 MB per request or approximately 1,000 documents, whichever comes first.

// POST /_bulk
// Content-Type: application/x-ndjson
// Body — each pair is: action line + document line + newline

// index — create or replace document
{"index":{"_index":"products","_id":"1"}}
{"title":"Wireless Headphones","price":79.99,"status":"published"}

// create — fail if document with _id already exists
{"create":{"_index":"products","_id":"2"}}
{"title":"Bluetooth Speaker","price":49.99,"status":"published"}

// update — partial update (doc key wraps the fields to update)
{"update":{"_index":"products","_id":"1"}}
{"doc":{"price":69.99,"on_sale":true}}

// update with upsert — create if not exists, update if exists
{"update":{"_index":"products","_id":"99"}}
{"doc":{"price":29.99},"doc_as_upsert":true}

// delete — action line only (no document line)
{"delete":{"_index":"products","_id":"3"}}

// ── Bulk response JSON ─────────────────────────────────────────
// HTTP 200 even when some operations fail
{
  "took":   23,
  "errors": true,
  "items": [
    { "index":  { "_index":"products","_id":"1","result":"updated","status":200 } },
    { "create": { "_index":"products","_id":"2","error":{ "type":"version_conflict_engine_exception","reason":"..."},"status":409 } },
    { "update": { "_index":"products","_id":"1","result":"updated","status":200 } },
    { "delete": { "_index":"products","_id":"3","result":"deleted","status":200 } }
  ]
}
// Always check: errors === true → iterate items and inspect each item's error field

// ── Performance: routing and pipeline options ──────────────────
{"index":{"_index":"products","_id":"5","routing":"category-electronics","pipeline":"enrich-pipeline"}}
{"title":"USB-C Hub","price":34.99}

// ── Build NDJSON body in Node.js ───────────────────────────────
const documents = [
  { id: '1', title: 'Headphones', price: 79.99 },
  { id: '2', title: 'Speaker',    price: 49.99 },
]

const body = documents.flatMap(doc => [
  JSON.stringify({ index: { _index: 'products', _id: doc.id } }),
  JSON.stringify(doc),
]).join('\n') + '\n'  // trailing newline is required

await fetch('http://localhost:9200/_bulk', {
  method:  'POST',
  headers: { 'Content-Type': 'application/x-ndjson' },
  body,
})

The bulk API returns HTTP 200 even when some operations fail — always check the top-level errors boolean first, then iterate items and inspect each item's error key to identify which operations failed and why. Common failure causes include version conflicts (document was updated between read and write — handle with if_seq_no/if_primary_term optimistic concurrency), mapping conflicts (field value type does not match the mapping), and index not found errors. A 5-15 MB request body is the sweet spot: smaller batches incur too much per-request overhead, larger batches consume too much JVM heap on the Elasticsearch node during indexing.

Source Filtering and Response JSON Shape

Every Elasticsearch search response has the same JSON envelope: took (milliseconds), timed_out, _shards (shard success/failure counts), and hits (the results object). hits.total.value is the total count of matching documents. hits.hits is the array of up to size results, each containing _index, _id, _score, and _source (the original document JSON). Source filtering and pagination control dramatically affect response size and query performance.

// POST /products/_search

// ── _source filtering — return only specific fields ────────────
{
  "_source": ["title", "price", "status"],  // include list
  "query": { "match_all": {} }
}

// Exclude specific fields (e.g., large embedded content)
{
  "_source": {
    "includes": ["*"],
    "excludes": ["description", "raw_html", "embedding_vector"]
  },
  "query": { "match_all": {} }
}

// Disable _source entirely (use fields for doc values instead)
{
  "_source": false,
  "fields": ["title", "price", "created_at"],
  "query": { "match_all": {} }
}

// ── Response JSON shape ────────────────────────────────────────
{
  "took": 5,
  "timed_out": false,
  "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 },
  "hits": {
    "total": { "value": 1500, "relation": "eq" },
    "max_score": 1.23,
    "hits": [
      {
        "_index":  "products",
        "_id":     "abc123",
        "_score":  1.23,
        "_source": { "title": "Wireless Headphones", "price": 79.99 }
      }
    ]
  }
}

// ── from/size pagination — deep pagination is expensive ─────────
{
  "from": 0, "size": 20,
  "query": { "match_all": {} }
}
// WARNING: from + size must be <= index.max_result_window (default 10000)
// Deep pagination (from: 9990, size: 10) forces Elasticsearch to rank 10000 docs

// ── search_after — efficient stateless deep pagination ─────────
// Step 1: first page (include sort values in response)
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
  "query": { "match_all": {} }
}
// Step 2: use last hit's sort values as cursor for next page
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
  "search_after": ["2025-12-31T23:59:59Z", "last-doc-id"],
  "query": { "match_all": {} }
}

// ── scroll API — legacy deep pagination for data export ─────────
// Prefer search_after for new code; scroll is stateful and holds resources
// POST /products/_search?scroll=1m
{
  "size": 1000,
  "query": { "match_all": {} }
}
// Response includes scroll_id — use it to fetch next batch
// POST /_search/scroll
{ "scroll": "1m", "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gB..." }

// ── filter_path — prune the response envelope ──────────────────
// GET /products/_search?filter_path=hits.hits._source,hits.total.value
// Returns only the specified fields from the response JSON

The hits.total.relation field is either "eq" (exact count) or "gte" (lower bound — actual count is greater than or equal to value). By default, Elasticsearch stops counting at 10,000 for performance; set "track_total_hits": true in the request body to get an exact count regardless of size. For data export scenarios requiring all documents, search_after with a stable sort (include _id as a tiebreaker) is preferred over scroll API because it is stateless — scroll holds an open context on each shard, consuming memory proportional to the result set size.

Node.js Elasticsearch Client JSON Integration

The official @elastic/elasticsearch Node.js client provides TypeScript-typed access to every Elasticsearch API. The client accepts Query DSL JSON directly as method parameters — no manual JSON serialization or HTTP boilerplate needed. It handles connection pooling, retry logic, and serialization automatically. The client is compatible with Elasticsearch 8.x and Elastic Cloud.

import { Client, errors } from '@elastic/elasticsearch'

// ── Client setup ───────────────────────────────────────────────
const client = new Client({
  node: process.env.ELASTICSEARCH_URL ?? 'http://localhost:9200',
  // For Elastic Cloud:
  // cloud: { id: process.env.ELASTIC_CLOUD_ID! },
  // auth: { apiKey: process.env.ELASTIC_API_KEY! },
  maxRetries:      3,
  requestTimeout:  30000,
  sniffOnStart:    false,  // disable for single-node / cloud
})

// ── Search with typed Query DSL ────────────────────────────────
async function searchProducts(term: string, maxPrice: number) {
  const result = await client.search({
    index: 'products',
    size:  20,
    query: {
      bool: {
        must:   [{ match: { title: term } }],
        filter: [
          { term:  { status: 'published' } },
          { range: { price: { lte: maxPrice } } },
        ],
      },
    },
    _source: ['title', 'price', 'slug'],
    sort:    [{ _score: 'desc' }, { created_at: 'desc' }],
  })

  // result.hits.hits is typed as SearchHit<unknown>[]
  return result.hits.hits.map(hit => hit._source)
}

// ── Index a document ──────────────────────────────────────────
await client.index({
  index:   'products',
  id:      'prod-123',
  document: {
    title:      'Wireless Headphones',
    price:      79.99,
    status:     'published',
    created_at: new Date().toISOString(),
  },
})

// ── Update a document (partial) ────────────────────────────────
await client.update({
  index: 'products',
  id:    'prod-123',
  doc:   { price: 69.99, on_sale: true },
})

// ── Delete a document ─────────────────────────────────────────
await client.delete({ index: 'products', id: 'prod-123' })

// ── Bulk helper — large JSON dataset ingestion ─────────────────
const documents = [
  { id: 'p1', title: 'Headphones', price: 79.99 },
  { id: 'p2', title: 'Speaker',    price: 49.99 },
  { id: 'p3', title: 'Earbuds',    price: 29.99 },
]

const bulkResult = await client.helpers.bulk({
  datasource: documents,
  onDocument(doc) {
    return { index: { _index: 'products', _id: doc.id } }
  },
  onDrop(record) {
    // Called for each document that failed after retries
    console.error('Failed to index document:', record.document, record.error)
  },
  flushBytes:  5_000_000,  // 5 MB chunks (default)
  concurrency: 3,          // parallel bulk requests
})

console.log('Indexed:', bulkResult.successful, 'Failed:', bulkResult.failed)

// ── Error handling ────────────────────────────────────────────
async function safeSearch(query: string) {
  try {
    return await client.search({ index: 'products', query: { match: { title: query } } })
  } catch (err) {
    if (err instanceof errors.ResponseError) {
      // Elasticsearch returned an error response
      const body = err.meta.body  // error JSON from Elasticsearch
      console.error('ES error:', body.error.type, body.error.reason)
      throw new Error(`Elasticsearch ${err.meta.statusCode}: ${body.error.reason}`)
    }
    if (err instanceof errors.ConnectionError) {
      console.error('Cannot connect to Elasticsearch:', err.message)
    }
    throw err
  }
}

// ── Aggregation response typing ────────────────────────────────
const aggResult = await client.search({
  index: 'products',
  size:  0,
  aggs:  {
    by_status: { terms: { field: 'status', size: 5 } },
    avg_price: { avg:   { field: 'price' } },
  },
})

// Access aggregation results (cast needed as aggs are untyped by default)
const byStatus = aggResult.aggregations?.by_status as { buckets: { key: string; doc_count: number }[] }
const avgPrice = aggResult.aggregations?.avg_price as { value: number }
console.log('Avg price:', avgPrice.value)
byStatus.buckets.forEach(b => console.log(b.key, b.doc_count))

The client.helpers.bulk() helper is significantly simpler than building NDJSON manually — it handles chunking by byte size, parallel requests, automatic retry on recoverable errors (429, 502, 503), and per-document error callbacks via onDrop. For production use, always configure auth with an API key rather than basic credentials, and enable TLS. The client supports request and response interceptors via client.diagnostic.on() for logging, tracing, and metrics collection. See the JSON database queries guide for SQL vs. NoSQL JSON pattern comparison.

Key Terms

Query DSL: Elasticsearch's domain-specific language for expressing search queries as JSON objects. Every Query DSL query is a JSON object with a single key identifying the query type (e.g., match, term, bool, range), and a value that is itself an object containing the query configuration. Queries operate in two contexts: query context (computes a relevance _score for each matching document) and filter context (binary match/no-match with no scoring, results cached). Query DSL supports leaf queries (operate on a single field), compound queries (combine other queries, e.g., bool), and special queries (e.g., function_score, percolate).
inverted index: The core Lucene data structure that powers Elasticsearch full-text search. At index time, text fields are analyzed (tokenized, lowercased, stemmed) and each token is mapped to a list of document IDs that contain it — the inverse of a document-to-word mapping. When a query is executed, Elasticsearch looks up the query tokens in the inverted index and computes the intersection (for AND) or union (for OR) of their document lists. Keyword, numeric, date, and boolean fields are not stored in an inverted index but in doc values (columnar storage), which enables efficient sorting and aggregations. The _source field stores the original JSON document separately from the inverted index.
aggregation: An Elasticsearch analytics computation run over the set of documents matching a query. Aggregations are declared in the aggs key of a search request and have two categories: bucket aggregations (group documents into discrete buckets, e.g., terms, date_histogram, range) and metric aggregations (compute a single numeric value per bucket, e.g., avg, sum, min, max, cardinality). Sub-aggregations nest inside bucket aggregations, enabling multi-level analytics such as average revenue per product category per month. Aggregation results appear in the aggregations key of the response JSON, keyed by the names you define.
mapping: The schema definition for an Elasticsearch index, specifying each field's data type and indexing options. Mappings are defined as a JSON object under the mappings.properties key when creating an index with the PUT index API. Once an index is created, existing field mappings are immutable — you cannot change a field's type or most of its parameters. New fields can be added at any time. Changing a field type requires creating a new index with the updated mappings and reindexing all documents using the _reindex API. Use index aliases to enable zero-downtime reindexing by switching the alias from the old index to the new one atomically.
NDJSON: Newline Delimited JSON — a format where each line is a complete, valid JSON object, with lines separated by newline characters (\n). Elasticsearch uses NDJSON for the bulk API (two lines per operation: action metadata + document) and the multi-search API (_msearch). Unlike JSON arrays, NDJSON can be streamed and parsed incrementally without loading the entire payload into memory. The Content-Type for NDJSON requests to Elasticsearch is application/x-ndjson. The body must end with a trailing newline character. NDJSON is also used by Logstash and Beats for log data ingestion pipelines.
relevance score: A floating-point number (_score) computed by Elasticsearch for each document in a query context, representing how well the document matches the query. Elasticsearch uses the BM25 algorithm (Okapi BM25) by default, which considers term frequency (how often the query term appears in the document), inverse document frequency (how rare the term is across all documents — rarer terms score higher), and field length normalization (shorter fields with the term score higher than longer fields). Documents in filter context are not scored (score is 0). Scores are only meaningful relative to each other within the same query — comparing scores across different queries is not valid. Use function_score to modify scores based on document fields.

FAQ

How do I write a basic Elasticsearch JSON query?

Send a POST request to /index/_search with a JSON body containing a query key. The simplest query is {"query":{"match_all":{}}}, which returns all documents. For full-text search on a field: {"query":{"match":{"title":"elasticsearch"}}}. For exact-match on a keyword field: {"query":{"term":{"status.keyword":"published"}}}. Add size (default 10) to control how many results are returned, from for pagination offset, and _source to specify which fields to include in each hit's _source object. The response always has the same envelope: took (ms), hits.total.value (total matching count), and hits.hits (array of matching documents with _id, _score, and _source).

What is the difference between match and term queries in Elasticsearch?

match is a full-text query that analyzes the search input before looking it up in the inverted index — it applies the same analyzer used at index time (tokenization, lowercasing, stemming). Use match on text fields. term is an exact-match query that does NOT analyze the input — it searches for the literal value as-is. Use term on keyword, numeric, date, and boolean fields. The most common mistake: running a term query on a text field. Because the text field was analyzed at index time ("Elasticsearch" stored as the token "elasticsearch"), a term query for "Elasticsearch" (capitalized) will match zero documents. Always use term on the .keyword sub-field (e.g., title.keyword) for exact-match lookups on text data.

How do I filter Elasticsearch results without affecting the relevance score?

Use the filter clause inside a bool query. Queries in filter context skip relevance scoring entirely — they only determine include/exclude — and their results are cached in the node-level filter cache. This makes filter clauses 2-5× faster than equivalent must clauses for exact-match conditions. The pattern: put full-text queries (match, multi_match) in must so they contribute to _score; put all exact-match conditions (term, terms, range, exists) in filter. Example: {"query":{"bool":{"must":[{"match":{"title":"shoes"}}],"filter":[{"term":{"status":"published"}},{"range":{"price":{"lte":100}}}]}}}. Use must_not for exclusion conditions — also cached and non-scoring.

How does the Elasticsearch bulk API NDJSON format work?

The bulk API body consists of pairs of lines: an action metadata line followed by a document line (except delete, which has no document line). Each line is a complete JSON object; lines are separated by newline characters, and the body must end with a trailing newline. Action types are index (create or replace), create (fail if exists), update (partial update), and delete. The action metadata specifies _index and _id. The update action wraps the partial document in a doc key; add "doc_as_upsert": true to create the document if it does not exist. The response is always HTTP 200 — check the top-level errors boolean, then each item in the items array for per-operation results and errors. Optimal request size is 5-15 MB per batch.

What are Elasticsearch aggregations and how do I write them in JSON?

Aggregations compute analytics over the matched document set and are declared in the aggs key at the same level as query in the request body. Each aggregation has a name you choose, a type key (e.g., terms, date_histogram, avg), and configuration. Bucket aggregations group documents (like GROUP BY): terms groups by a keyword field value, date_histogram groups by time intervals. Metric aggregations compute values: avg, sum, min, max, cardinality. Nest sub-aggregations inside bucket aggregations using an inner aggs key. Set size: 0 in the request to skip returning individual documents when you only need aggregation results — this significantly speeds up the query.

How do I define Elasticsearch index mappings in JSON?

Create an index with explicit mappings using PUT /index-name with a body containing a mappings.properties object. Each field entry has a type (text, keyword, long, double, date, boolean, object, nested, geo_point) and optional configuration (analyzer, format, etc.). Key rules: use text for full-text searchable fields and keyword for exact-match, sortable, or aggregatable fields — many fields need both (the auto-created .keyword sub-field handles this). Mappings are immutable after creation — you can add new fields but cannot change existing field types. To change a field type, create a new index with updated mappings, reindex documents with POST /_reindex, and switch an index alias from the old index to the new one for zero-downtime migration.

How do I reduce the size of Elasticsearch JSON responses?

Use _source includes to return only specific fields: {"_source":["title","price"],"query":{...}}. This reduces response size by 60-90% for documents with many fields. Use _source: false combined with the fields parameter to fetch values from doc values instead (faster for non-text fields, no stored source overhead). For aggregation-only queries, set size: 0 to omit the hits.hits array entirely. Use filter_path as a URL parameter to prune the response envelope to only the keys you need (e.g., ?filter_path=hits.hits._source,hits.total.value). For deep pagination returning large result sets, use search_after instead of large from values — from: 9990, size: 10 forces Elasticsearch to rank 10,000 documents just to return 10.

How do I use the Node.js Elasticsearch client to search with JSON?

Install @elastic/elasticsearch, create a Client instance with your node URL (or Elastic Cloud ID and API key), then call client.search({}) with the index name and Query DSL JSON as parameters — the client handles serialization. Access results via result.hits.hits (array of hit objects with _source, _id, _score) and result.hits.total.value. For bulk ingestion, use client.helpers.bulk({}) with a datasource array and an onDocument function returning the action metadata — it handles NDJSON formatting, chunking, retries, and the onDrop callback for failed documents. Catch errors.ResponseError for Elasticsearch API errors and inspect err.meta.body.error for the error JSON details.