JSON API Filtering: Query Syntax, Type-Safe Parsing & SQL Injection Prevention

Last updated:

JSON API filtering covers three distinct problems: choosing a filter query syntax (flat key=value, MongoDB-style JSON operators, JSON:API filter[] bracket notation, or RQL), safely parsing and validating that filter input with strict schemas, and translating validated filters into SQL or a search DSL without opening SQL injection vulnerabilities. Most articles cover only ?status=published — this guide covers the structured filter formats that real production APIs use, including MongoDB-style {"$gt": 100} operators, JSON:API filter[field][operator]=value URL syntax, and Elasticsearch query DSL.

Type safety matters at every layer. An incoming filter JSON object that passes a raw JSON.parse() has type unknown in TypeScript — every field access is unsafe until you validate the structure with Zod or a JSON Schema validator. Field names from filter parameters must never be interpolated into SQL strings directly: even if the value is parameterized, a client-supplied field name like 1=1-- or id; DROP TABLE used as a SQL identifier causes injection. The correct pattern is a hardcoded allowlist that maps client field names to SQL column references, combined with fully parameterized values. This guide walks through each filter approach, safe Zod schemas for filter validation, field allowlist patterns, SQL query builder integration, JSON:API filter conventions, full-text search combination, and pagination performance with complex filters.

Filter Query Approaches Compared

Four main patterns exist for expressing filters in a REST API. Each has different expressiveness, URL-friendliness, and implementation complexity trade-offs.

`// ── Approach 1: Simple key=value query parameters ─────────────────────
// GET /articles?status=published&authorId=42
// Pros: universal, zero parsing, works with any HTTP client
// Cons: no operators (only equality), no negation, no range, no OR logic
// → Use only for trivial single-value equality filters

// ── Approach 2: MongoDB-style JSON operators ──────────────────────────
// Sent as URL-encoded query parameter or JSON request body
// GET /articles?filter=%7B"status"%3A%7B"%24eq"%3A"published"%7D%7D
// or:
// POST /articles/search   (non-standard but cleaner for complex filters)
// Body: { "filter": { "status": { "$eq": "published" }, "views": { "$gte": 1000 } } }
//
// Pros: expressive, familiar to MongoDB users, composable $and/$or
// Cons: URL-encoding is ugly, POST-for-search is non-standard
// → Best for: internal APIs, developer-facing APIs, search endpoints

// ── Approach 3: JSON:API filter[] bracket notation ────────────────────
// GET /articles?filter[status]=published&filter[views][gte]=1000
// GET /articles?filter[author.name]=Alice&filter[tags][in][]=js&filter[tags][in][]=ts
//
// Pros: stays in URL query string, standard HTTP GET semantics, cacheable
// Cons: complex boolean logic (OR) is awkward in bracket notation
// → Best for: public REST APIs following JSON:API spec

// ── Approach 4: RQL (Resource Query Language) ─────────────────────────
// GET /articles?q=eq(status,published)&q=gt(views,1000)
// GET /articles?q=and(eq(status,published),gt(year,2024))
//
// Pros: full boolean algebra, nestable, compact
// Cons: requires a dedicated RQL parser library, less familiar
// → Best for: sophisticated query APIs, data platforms

// ── Comparison table ──────────────────────────────────────────────────
// Approach          | Range | OR | Nested | Cacheable | Complexity
// ─────────────────────────────────────────────────────────────────
// key=value         |  No   | No |  No    |   Yes     | Trivial
// MongoDB operators |  Yes  |Yes |  Yes   |  Partial  | Medium
// JSON:API filter[] |  Yes  |Awkward| Yes |   Yes     | Medium
// RQL               |  Yes  |Yes |  Yes   |   Yes     | High

// ── Practical recommendation ──────────────────────────────────────────
// For new APIs: MongoDB-style operators as a single ?filter= JSON param,
// or JSON:API bracket notation if you follow the JSON:API spec.
// Always document your chosen format in OpenAPI with a JSON Schema
// for the filter object structure.

The choice of filter syntax determines your API contract and client developer experience. For most APIs, MongoDB-style operators or JSON:API bracket notation offer the best balance. Whichever you choose, document it with an OpenAPI schema on the filter parameter — this enables client SDK generation and Swagger UI filter testing.

MongoDB-Style JSON Filter Operators

MongoDB-style operators use a $ prefix and are nested inside field objects. They map naturally to SQL comparison and logical operators and are familiar to a large developer audience. The operator set covers all common filter needs without custom syntax.

`// ── Core comparison operators ─────────────────────────────────────────
// { field: { $op: value } }
//
// $eq   →  field = value          (also written as { field: value } shorthand)
// $ne   →  field != value
// $gt   →  field > value
// $gte  →  field >= value
// $lt   →  field < value
// $lte  →  field <= value
// $in   →  field IN (v1, v2, ...)   (accepts an array, max ~100 values)
// $nin  →  field NOT IN (...)
// $exists → field IS NOT NULL / IS NULL

// ── Example filter objects ────────────────────────────────────────────
const statusFilter = { status: { $eq: 'published' } }
const rangeFilter  = { publishedAt: { $gte: '2024-01-01', $lte: '2024-12-31' } }
const multiFilter  = { tags: { $in: ['javascript', 'typescript', 'nodejs'] } }
const negateFilter = { status: { $ne: 'draft' } }
const existsFilter = { deletedAt: { $exists: false } }  // IS NULL

// ── Logical operators: $and, $or, $not ────────────────────────────────
const andFilter = {
  $and: [
    { status: { $eq: 'published' } },
    { views: { $gte: 100 } },
  ],
}

const orFilter = {
  $or: [
    { status: { $eq: 'published' } },
    { status: { $eq: 'featured' } },
  ],
}

const notFilter = {
  $not: { status: { $eq: 'draft' } },
}

// ── Sending as a query parameter (URL-encoded JSON) ───────────────────
// GET /api/articles?filter={"status":{"$eq":"published"},"views":{"$gte":100}}
// Client code (JavaScript):
const filter = JSON.stringify({ status: { $eq: 'published' }, views: { $gte: 100 } })
const url = `/api/articles?filter=${encodeURIComponent(filter)}`

// ── Sending as a JSON request body (POST /search) ─────────────────────
// POST /api/articles/search
// Content-Type: application/json
// { "filter": { "status": { "$eq": "published" }, "views": { "$gte": 100 } },
//   "sort": [{ "field": "publishedAt", "order": "desc" }],
//   "page": { "size": 20, "cursor": "eyJpZCI6MTAwfQ==" } }

// ── Express.js handler: parse the filter parameter ────────────────────
import express from 'express'
const app = express()
app.use(express.json())

app.get('/api/articles', (req, res) => {
  let filter: unknown
  try {
    filter = req.query.filter
      ? JSON.parse(req.query.filter as string)
      : {}
  } catch {
    return res.status(400).json({ error: 'Invalid filter JSON' })
  }
  // Next: validate with Zod (see Section 3)
  // Then: translate to SQL (see Section 4)
})

The $in operator is especially useful for multi-select UI filters (a user picks several status values from a checkbox list). Keep the array size below 100 elements — larger sets should be handled by a subquery or a temporary table join, not an ever-growing IN clause. For the $exists operator, map true to IS NOT NULL and false to IS NULL in your SQL translator.

Parsing and Validating Filter JSON

After parsing the raw JSON, the result has type unknown. Every subsequent field access is unsafe. Use Zod to define a recursive schema that enforces the exact shape of allowed filter operators before any SQL translation begins. Reject any document that contains unknown operators or fields.

`import { z } from 'zod'

// ── Step 1: Define operator schemas for each value type ───────────────
const StringFilter = z.union([
  z.string(),  // shorthand: { field: "value" } → implicit $eq
  z.object({
    $eq:  z.string().optional(),
    $ne:  z.string().optional(),
    $in:  z.array(z.string()).max(100).optional(),
    $nin: z.array(z.string()).max(100).optional(),
    $exists: z.boolean().optional(),
  }).strict(),  // .strict() rejects unknown keys — critical for security
])

const NumberFilter = z.union([
  z.number(),
  z.object({
    $eq:  z.number().optional(),
    $ne:  z.number().optional(),
    $gt:  z.number().optional(),
    $gte: z.number().optional(),
    $lt:  z.number().optional(),
    $lte: z.number().optional(),
    $in:  z.array(z.number()).max(100).optional(),
    $nin: z.array(z.number()).max(100).optional(),
    $exists: z.boolean().optional(),
  }).strict(),
])

const DateFilter = z.union([
  z.string().datetime(),  // ISO 8601
  z.object({
    $eq:  z.string().datetime().optional(),
    $gt:  z.string().datetime().optional(),
    $gte: z.string().datetime().optional(),
    $lt:  z.string().datetime().optional(),
    $lte: z.string().datetime().optional(),
    $exists: z.boolean().optional(),
  }).strict(),
])

// ── Step 2: Define the article filter schema ──────────────────────────
// Only include fields that are in your SQL allowlist (Section 4)
const ArticleFilterSchema = z.object({
  status:      StringFilter.optional(),
  authorId:    NumberFilter.optional(),
  publishedAt: DateFilter.optional(),
  viewCount:   NumberFilter.optional(),
  tags:        z.object({
    $in:  z.array(z.string()).max(100).optional(),
    $nin: z.array(z.string()).max(100).optional(),
  }).strict().optional(),
}).strict()  // reject any field not in this schema

// ── Step 3: Add logical operators ($and, $or, $not) ──────────────────
// Recursive type: FilterSchema can contain $and/$or arrays of FilterSchema
type ArticleFilter = z.infer<typeof ArticleFilterSchema> & {
  $and?: ArticleFilter[]
  $or?:  ArticleFilter[]
  $not?: ArticleFilter
}

// For recursive Zod schemas, use z.lazy():
const ArticleFilterWithLogic: z.ZodType<ArticleFilter> = z.lazy(() =>
  ArticleFilterSchema.extend({
    $and: z.array(ArticleFilterWithLogic).min(1).max(10).optional(),
    $or:  z.array(ArticleFilterWithLogic).min(1).max(10).optional(),
    $not: ArticleFilterWithLogic.optional(),
  })
)

// ── Step 4: Validate in the request handler ───────────────────────────
app.get('/api/articles', (req, res) => {
  let rawFilter: unknown
  try {
    rawFilter = req.query.filter
      ? JSON.parse(req.query.filter as string)
      : {}
  } catch {
    return res.status(400).json({ error: 'filter must be valid JSON' })
  }

  const result = ArticleFilterWithLogic.safeParse(rawFilter)
  if (!result.success) {
    return res.status(400).json({
      error: 'Invalid filter',
      details: result.error.flatten().fieldErrors,
    })
  }

  const validatedFilter = result.data  // fully typed ArticleFilter
  // → pass to SQL translator (Section 4)
})

The .strict() call on each object schema is the key security control — without it, Zod passes unknown keys through silently, allowing an attacker to inject operators you did not intend to support. The .max(100) on $in arrays prevents denial-of-service via enormous IN clauses. Limit $and/$or nesting depth (the .max(10) on the array) to prevent exponential query plan explosion from deeply recursive boolean trees.

Translating JSON Filters to SQL Safely

Validated filter objects must be translated to SQL using two invariants: field names come from a hardcoded allowlist (never from the client), and all values are bound as parameterized query placeholders (never interpolated). A query builder like Knex enforces parameterization automatically.

`import knex from 'knex'
const db = knex({ client: 'pg', connection: process.env.DATABASE_URL })

// ── Step 1: Hardcoded field allowlist ─────────────────────────────────
// Maps client-facing field names to exact SQL column references.
// This is the primary SQL injection defense for field names.
const ALLOWED_FIELDS = {
  status:      'articles.status',
  authorId:    'articles.author_id',
  publishedAt: 'articles.published_at',
  viewCount:   'articles.view_count',
  tags:        'articles.tags',  // assumed to be a jsonb or text[] column
} as const

type AllowedField = keyof typeof ALLOWED_FIELDS

// ── Step 2: Operator → SQL mapping ───────────────────────────────────
const OPERATOR_MAP = {
  $eq:  '=',
  $ne:  '!=',
  $gt:  '>',
  $gte: '>=',
  $lt:  '<',
  $lte: '<=',
} as const

// ── Step 3: Recursive filter translator ──────────────────────────────
import { Knex } from 'knex'

function applyFilter(
  qb: Knex.QueryBuilder,
  filter: ArticleFilter,
): Knex.QueryBuilder {
  for (const [key, condition] of Object.entries(filter)) {
    // Handle logical operators
    if (key === '$and') {
      qb = qb.where((builder) => {
        for (const subFilter of condition as ArticleFilter[]) {
          builder.where((sub) => applyFilter(sub, subFilter))
        }
      })
      continue
    }
    if (key === '$or') {
      qb = qb.where((builder) => {
        for (const subFilter of condition as ArticleFilter[]) {
          builder.orWhere((sub) => applyFilter(sub, subFilter))
        }
      })
      continue
    }
    if (key === '$not') {
      qb = qb.whereNot((builder) => applyFilter(builder, condition as ArticleFilter))
      continue
    }

    // Field must exist in the allowlist
    if (!(key in ALLOWED_FIELDS)) {
      throw new Error(`Field '${key}' is not filterable`)  // Zod already blocked this
    }
    const column = ALLOWED_FIELDS[key as AllowedField]

    // Handle shorthand { field: value } → implicit $eq
    if (typeof condition !== 'object' || condition === null) {
      qb = qb.where(column, '=', condition)
      continue
    }

    // Handle operator object { $eq, $gt, $in, ... }
    for (const [op, value] of Object.entries(condition as Record<string, unknown>)) {
      if (op === '$in' || op === '$nin') {
        const vals = value as unknown[]
        if (op === '$in')  qb = qb.whereIn(column, vals)
        if (op === '$nin') qb = qb.whereNotIn(column, vals)
      } else if (op === '$exists') {
        if (value) qb = qb.whereNotNull(column)
        else       qb = qb.whereNull(column)
      } else if (op in OPERATOR_MAP) {
        const sqlOp = OPERATOR_MAP[op as keyof typeof OPERATOR_MAP]
        qb = qb.where(column, sqlOp, value)  // value is bound as a parameter
      }
    }
  }
  return qb
}

// ── Step 4: Use in a query ────────────────────────────────────────────
async function getArticles(filter: ArticleFilter) {
  const rows = await db('articles')
    .select('id', 'title', 'status', 'published_at', 'view_count')
    .where((qb) => applyFilter(qb, filter))
    .orderBy('published_at', 'desc')
    .limit(20)

  return rows
}

// ── What Knex generates (safe parameterized SQL) ──────────────────────
// Filter: { status: { $eq: 'published' }, viewCount: { $gte: 100 } }
// SQL:
//   SELECT id, title, status, published_at, view_count
//   FROM articles
//   WHERE (articles.status = $1 AND articles.view_count >= $2)
//   ORDER BY published_at DESC
//   LIMIT 20
// Params: ['published', 100]
//
// Note: the column name comes from ALLOWED_FIELDS (hardcoded),
// the value is bound as $1/$2 (never interpolated) → injection-safe

Knex handles value parameterization automatically — every value passed to .where(column, op, value) becomes a bound parameter in the generated SQL. The column name itself comes from ALLOWED_FIELDS, which is a compile-time constant. This two-layer protection (allowlist for field names, parameterization for values) makes the translator injection-safe regardless of what the client sends. For raw SQL without a query builder, build a params array and use $1, $2 placeholders manually.

JSON:API Filter Specification

The JSON:API specification reserves the filter query parameter family but leaves the internal syntax to the implementation. The most widely adopted convention uses bracket notation where field names and operators are URL query parameter keys.

`// ── JSON:API filter bracket notation ─────────────────────────────────
// Simple equality (most common):
// GET /articles?filter[status]=published
// → WHERE status = 'published'

// Operator variant (nested bracket):
// GET /articles?filter[viewCount][gte]=100&filter[viewCount][lte]=10000
// → WHERE view_count >= 100 AND view_count <= 10000

// Date range:
// GET /articles?filter[publishedAt][gte]=2024-01-01&filter[publishedAt][lt]=2025-01-01

// Array / multi-value (in):
// GET /articles?filter[tags][in][]=javascript&filter[tags][in][]=typescript
// → WHERE tags @> ARRAY['javascript','typescript']  (PostgreSQL)

// Relationship filter (dot notation — common extension):
// GET /articles?filter[author.name]=Alice
// → JOIN authors ON authors.id = articles.author_id WHERE authors.name = 'Alice'

// ── Parsing bracket notation in Node.js ──────────────────────────────
// Express (with qs library, default for Express):
// req.query.filter is already parsed as a nested object:
// { status: 'published', viewCount: { gte: '100' } }
// Note: all values are strings — cast to numbers where needed

import express from 'express'
const app = express()

const JSONAPI_OP_MAP: Record<string, string> = {
  eq: '=', ne: '!=', gt: '>', gte: '>=', lt: '<', lte: '<=',
}

const JSONAPI_ALLOWED_FIELDS: Record<string, string> = {
  status:      'articles.status',
  viewCount:   'articles.view_count',
  publishedAt: 'articles.published_at',
  authorId:    'articles.author_id',
}

app.get('/api/articles', async (req, res) => {
  const rawFilter = req.query.filter as Record<string, unknown> | undefined

  if (!rawFilter) {
    const rows = await db('articles').select().limit(20)
    return res.json({ data: rows })
  }

  let qb = db('articles').select()

  for (const [field, condition] of Object.entries(rawFilter)) {
    // Reject unknown fields immediately
    if (!(field in JSONAPI_ALLOWED_FIELDS)) {
      return res.status(400).json({ errors: [{ detail: `Filter field '${field}' is not supported` }] })
    }
    const column = JSONAPI_ALLOWED_FIELDS[field]

    if (typeof condition === 'string') {
      // Simple equality: filter[status]=published
      qb = qb.where(column, '=', condition)
    } else if (typeof condition === 'object' && condition !== null) {
      // Operator object: filter[viewCount][gte]=100
      for (const [op, value] of Object.entries(condition as Record<string, string>)) {
        if (op === 'in') {
          // filter[tags][in][]=js&filter[tags][in][]=ts
          const vals = Array.isArray(value) ? value : [value]
          qb = qb.whereIn(column, vals)
        } else if (op in JSONAPI_OP_MAP) {
          const sqlOp = JSONAPI_OP_MAP[op]
          // Cast string to number if the column is numeric
          const castedValue = isNaN(Number(value)) ? value : Number(value)
          qb = qb.where(column, sqlOp, castedValue)
        } else {
          return res.status(400).json({ errors: [{ detail: `Operator '${op}' is not supported` }] })
        }
      }
    }
  }

  const rows = await qb.limit(20)
  res.json({ data: rows })
})

// ── Combining with JSON:API pagination ────────────────────────────────
// GET /articles?filter[status]=published&page[size]=20&page[number]=2
// The filter and page parameters are parsed separately by qs:
// req.query.filter → { status: 'published' }
// req.query.page   → { size: '20', number: '2' }

A subtle JSON:API parsing issue: all bracket-notation values arrive as strings in Express (via the qs library). Numeric filter values like filter[viewCount][gte]=100 come in as "100" (string) and must be cast before binding to SQL. Use Number(value) for numeric columns and validate the result with isNaN() — reject non-numeric strings for numeric columns with a 400 error rather than silently passing NaN to the database.

Full-Text Search with JSON Filter APIs

Full-text search is best handled as a dedicated parameter alongside the structured filter, not as another filter field. The search term maps to a fundamentally different SQL or search DSL construct — a tsvector @@ tsquery expression in PostgreSQL or a bool.must.match clause in Elasticsearch.

`// ── PostgreSQL: full-text search + JSON filter ────────────────────────
// GET /articles?search=json+api+filtering&filter[status]=published

// 1. Create a GIN index on the tsvector expression
// (run once in a migration):
// CREATE INDEX idx_articles_fts ON articles
//   USING GIN (to_tsvector('english', title || ' ' || coalesce(body, '')));

// 2. In the handler: validate search, build parameterized query
import { z } from 'zod'

const SearchSchema = z.object({
  search: z.string().max(200).optional(),
  filter: ArticleFilterWithLogic.optional(),
  page:   z.object({ size: z.coerce.number().min(1).max(100).default(20),
                     cursor: z.string().optional() }).optional(),
})

app.get('/api/articles', async (req, res) => {
  const parsed = SearchSchema.safeParse(req.query)
  if (!parsed.success) return res.status(400).json({ error: parsed.error.flatten() })
  const { search, filter, page } = parsed.data

  let qb = db('articles').select('id', 'title', 'status', 'published_at')

  // Apply full-text search (parameterized — never interpolate search string)
  if (search) {
    qb = qb.whereRaw(
      `to_tsvector('english', title || ' ' || coalesce(body, '')) @@ websearch_to_tsquery('english', ?)`,
      [search]
    )
    // Also add a relevance score for ORDER BY:
    qb = qb.select(
      db.raw(`ts_rank(to_tsvector('english', title || ' ' || coalesce(body, '')), websearch_to_tsquery('english', ?)) AS rank`, [search])
    )
  }

  // Apply structured filter
  if (filter) {
    qb = applyFilter(qb, filter)  // from Section 4
  }

  // Order by relevance when search is active, otherwise by date
  qb = search
    ? qb.orderBy('rank', 'desc')
    : qb.orderBy('published_at', 'desc')

  const rows = await qb.limit(page?.size ?? 20)
  res.json({ data: rows })
})

// ── Elasticsearch: bool query DSL combining search + filters ──────────
// Elasticsearch uses a JSON request body for complex queries.
// GET with a body is non-standard HTTP — use POST /_search in practice.
//
// POST /articles/_search
// {
//   "query": {
//     "bool": {
//       "must": [
//         { "multi_match": {
//             "query": "json api filtering",
//             "fields": ["title^3", "body"]   // title boosted 3x
//           }
//         }
//       ],
//       "filter": [
//         { "term": { "status": "published" } },
//         { "range": { "view_count": { "gte": 100 } } },
//         { "terms": { "tags": ["javascript", "typescript"] } }
//       ]
//     }
//   },
//   "sort": [{ "_score": "desc" }, { "published_at": "desc" }],
//   "from": 0,
//   "size": 20
// }
//
// Key Elasticsearch concepts:
// - "must"   → affects relevance score (full-text search goes here)
// - "filter" → does NOT affect score, IS cached → always use for exact/range filters
// - "should" → boosts score if matched (optional match)
// - "must_not" → excludes documents

import { Client } from '@elastic/elasticsearch'
const esClient = new Client({ node: process.env.ELASTICSEARCH_URL })

async function searchArticles(search: string, filter: ArticleFilter) {
  const esFilter: object[] = []

  // Translate JSON filter to Elasticsearch filter clauses
  if (filter.status) {
    const s = filter.status
    if (typeof s === 'string') esFilter.push({ term: { status: s } })
    else if ('$in' in s)       esFilter.push({ terms: { status: s.$in } })
    else if ('$eq' in s)       esFilter.push({ term: { status: s.$eq } })
  }
  if (filter.viewCount && typeof filter.viewCount === 'object') {
    const range: Record<string, number> = {}
    if ('$gte' in filter.viewCount) range.gte = filter.viewCount.$gte!
    if ('$lte' in filter.viewCount) range.lte = filter.viewCount.$lte!
    esFilter.push({ range: { view_count: range } })
  }

  const response = await esClient.search({
    index: 'articles',
    body: {
      query: {
        bool: {
          must:   search ? [{ multi_match: { query: search, fields: ['title^3', 'body'] } }] : [],
          filter: esFilter,
        },
      },
      sort: [{ _score: 'desc' }, { published_at: 'desc' }],
      size: 20,
    },
  })

  return response.hits.hits.map(h => h._source)
}

Use websearch_to_tsquery (PostgreSQL 11+) instead of to_tsquery for user-facing search — it tolerates malformed queries and parses Google-style syntax (json AND filtering, "exact phrase", -excluded) without throwing database errors. In Elasticsearch, always put non-text conditions in the filter clause rather than must — filter clauses are cached by the query cache and do not contribute to relevance scoring, making them significantly faster for large datasets.

Filter Pagination and Performance

Pagination with complex JSON filters requires choosing the right strategy (offset vs cursor), ensuring the filter fields are indexed, and understanding when a COUNT(*) is necessary versus expensive.

`-- ── Index strategy for filter fields ───────────────────────────────
-- Every field in your ALLOWED_FIELDS allowlist should have an index.
-- The index type depends on the operators you support.

-- B-tree index: equality, range, ORDER BY, BETWEEN
CREATE INDEX idx_articles_status      ON articles (status);
CREATE INDEX idx_articles_published   ON articles (published_at DESC);
CREATE INDEX idx_articles_view_count  ON articles (view_count);
CREATE INDEX idx_articles_author      ON articles (author_id);

-- Composite index: most common filter combination + sort field
-- Covers: WHERE status = ? ORDER BY published_at DESC
CREATE INDEX idx_articles_status_date ON articles (status, published_at DESC);

-- GIN index for JSONB filter columns
-- If tags is a jsonb column: WHERE tags @> '["javascript"]'
CREATE INDEX idx_articles_tags_gin ON articles USING GIN (tags);

-- Partial index: GIN only on published articles (common filter)
CREATE INDEX idx_articles_tags_published ON articles USING GIN (tags)
WHERE status = 'published';
-- Reduces index size if most GIN queries also filter on status = 'published'

-- ── Offset pagination with filter ─────────────────────────────────────
-- GET /articles?filter[status]=published&page[number]=2&page[size]=20
-- SQL: requires COUNT(*) for total pages → expensive on large tables

SELECT id, title, published_at
FROM articles
WHERE status = $1          -- uses idx_articles_status_date
ORDER BY published_at DESC
LIMIT $2 OFFSET $3;
-- Params: ['published', 20, 20]  (page 2: offset = (2-1)*20 = 20)

-- Total count for pagination metadata:
SELECT COUNT(*) FROM articles WHERE status = $1;
-- → This is a second query with the same WHERE; expensive on 10M+ rows.
-- Alternative: approximate count from EXPLAIN:
EXPLAIN SELECT id FROM articles WHERE status = $1;
-- Read "rows=xxxxx" from the plan — estimate only, fast

-- ── Cursor pagination with filter (preferred for large tables) ────────
-- Cursor encodes last row's sort field + id to avoid COUNT(*) entirely
-- GET /articles?filter[status]=published&page[cursor]=eyJkYXRlIjoiMjAyNS0wMS0wMSIsImlkIjoxMDB9

// TypeScript: cursor encoding/decoding
type Cursor = { publishedAt: string; id: number }

function encodeCursor(row: { published_at: string; id: number }): string {
  return Buffer.from(JSON.stringify({ publishedAt: row.published_at, id: row.id })).toString('base64url')
}

function decodeCursor(encoded: string): Cursor {
  const parsed = JSON.parse(Buffer.from(encoded, 'base64url').toString())
  return CursorSchema.parse(parsed)  // validate with Zod
}

const CursorSchema = z.object({
  publishedAt: z.string().datetime(),
  id: z.number().int(),
})

// Cursor WHERE clause: (published_at, id) < (cursor_date, cursor_id)
// This uses the composite index idx_articles_status_date efficiently:
async function getArticlesCursor(filter: ArticleFilter, cursor?: string, pageSize = 20) {
  let qb = db('articles')
    .select('id', 'title', 'status', 'published_at')
    .where((b) => applyFilter(b, filter))
    .orderBy([{ column: 'published_at', order: 'desc' }, { column: 'id', order: 'desc' }])
    .limit(pageSize + 1)  // fetch one extra to detect next page

  if (cursor) {
    const { publishedAt, id } = decodeCursor(cursor)
    // Row-value comparison: uses composite index
    qb = qb.whereRaw('(published_at, id) < (?, ?)', [publishedAt, id])
  }

  const rows = await qb
  const hasNext = rows.length > pageSize
  if (hasNext) rows.pop()

  const nextCursor = hasNext ? encodeCursor(rows[rows.length - 1]) : null
  return { data: rows, meta: { nextCursor, hasNext } }
}

-- ── Filter complexity limit ────────────────────────────────────────────
-- Limit the depth and breadth of $and/$or trees to prevent query plan explosion.
-- Enforced in the Zod schema (Section 3):
-- $and/$or arrays: max 10 elements, max 2 nesting levels

-- ── EXPLAIN: verify filter query uses index ───────────────────────────
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, title, published_at
FROM articles
WHERE status = 'published'
ORDER BY published_at DESC
LIMIT 20;
-- Look for: "Index Scan using idx_articles_status_date"
-- Avoid: "Seq Scan" or "Sort" steps — add or fix the index

Cursor pagination is strongly preferred over offset pagination for filtered result sets on large tables. Offset pagination degrades as the page number increases (the database must scan and discard all previous rows), and the COUNT(*) query runs the full filter on every page request. Cursor pagination eliminates both issues — each page query uses the cursor as an additional WHERE condition that, combined with the sort index, allows the database to jump directly to the correct starting row. Always run EXPLAIN (ANALYZE) on your most common filter combinations to verify index usage before deploying.

Key Terms

Filter Operator
A symbol or keyword within a filter expression that defines the comparison logic between a field and a value. In MongoDB-style JSON filters, operators use a $ prefix: $eq (equal), $ne (not equal), $gt (greater than), $gte (greater than or equal), $lt (less than), $lte (less than or equal), $in (field value in array), $nin (field value not in array), $exists (field is not null / is null). Logical operators — $and, $or, $not — combine multiple field conditions. In JSON:API bracket notation, the same concepts are expressed without the $: filter[field][gte]=value. In SQL, operators are keywords: =, !=, >, >=, <, <=, IN, NOT IN, IS NULL, IS NOT NULL. The filter translator maps client operator names to their SQL equivalents using a hardcoded lookup table — never using the client string directly as a SQL keyword.
RQL (Resource Query Language)
A query language for filtering, sorting, and shaping REST API responses, originally designed for data APIs. RQL expresses conditions as function calls in the URL query string: eq(status,published), gt(views,100), and(eq(status,published),gt(views,100)). Unlike MongoDB-style JSON operators (which require JSON parsing) or bracket notation (which requires deep nested key parsing), RQL uses a compact function syntax that remains readable in URLs without encoding. RQL supports comparison (eq, ne, lt, le, gt, ge), logical (and, or, not), set (in, out), and sorting (sort(+field,-other)) operations. Implementations exist for Node.js (rql npm package), Python, and Java. RQL is less widely used than MongoDB-style operators but is used by some enterprise data APIs and the ODATA protocol.
Allowlist (field allowlist)
A hardcoded mapping of client-facing filter field names to their corresponding SQL column references, used to prevent SQL injection through field name manipulation. The allowlist is defined as a compile-time constant in the server code — it is never derived from user input, runtime configuration files, or database queries. When a client filter contains a field name, the server looks it up in the allowlist: if found, the mapped SQL column reference is used; if not found, the request is rejected with HTTP 400. This protects against attacks where an attacker supplies a field name like 1=1--, ; DROP TABLE articles; --, or another SQL identifier that would cause injection if interpolated directly. The allowlist also serves as the source of truth for which fields are filterable — if a field is not in the allowlist, it is not filterable, regardless of whether it exists in the database. Separate from value parameterization: allowlists protect field names, parameterized queries protect field values.
SQL Injection
A code injection attack where malicious SQL is embedded in user input and executed by the database when that input is concatenated into a SQL string without sanitization. In the context of JSON API filters, SQL injection can occur in two places: field names and field values. Field name injection happens when a client-supplied filter key is used directly as a SQL identifier: {"`"}SELECT * FROM t WHERE ${"{"}req.query.field{"}"} = $1{"`"''} — an attacker can supply 1=1 OR id= as the field name, breaking the query structure. Value injection is blocked by parameterized queries (using ? or $1 placeholders with bound values) — the database treats the parameter as a literal value, never as SQL syntax. The complete defense for filter APIs: validate and parse the filter JSON with a strict schema (reject unknown operators), use a hardcoded field allowlist (reject unknown fields), and use parameterized queries for all values (never interpolate). All three layers together provide defense-in-depth.
Parameterized Query
A SQL query where user-supplied values are passed as separate bound parameters rather than interpolated into the query string. In PostgreSQL, placeholders are $1, $2, etc.; in MySQL and SQLite, they are ?. The database driver sends the query template and the parameter array separately to the database engine — the engine treats each parameter as a typed literal value and never interprets it as SQL syntax, regardless of its content. Example: SELECT * FROM articles WHERE status = $1 AND view_count >= $2 with params ['published', 100] is safe even if a user supplies published'; DROP TABLE articles; -- as the status value — the engine inserts it as a string literal, not as SQL. Query builders (Knex, Drizzle, Prisely, Kysely) generate parameterized queries automatically for all standard methods. Raw SQL via db.raw() or $queryRaw requires explicit placeholder usage — never use template literal interpolation inside raw SQL.
Query DSL (Domain-Specific Language)
A structured JSON format for expressing search and filter queries, most prominently used in Elasticsearch. The Elasticsearch Query DSL uses a JSON request body to compose complex boolean queries: bool.must for full-text conditions that affect relevance scoring, bool.filter for exact and range conditions that do not affect scoring but are cached, bool.should for optional conditions that boost scoring, and bool.must_not for exclusion conditions. Query DSL supports dozens of leaf query types: term (exact value), terms (multiple values, like SQL IN), range (numeric or date range), match (full-text, analyzed), multi_match (full-text across multiple fields), exists (field is not null), wildcard, prefix, and many others. Translating a JSON filter object to Elasticsearch Query DSL follows the same allowlist-and-mapping pattern as SQL translation — never pass client field names directly as Elasticsearch field references.
GIN Index
A Generalized Inverted Index — a PostgreSQL index structure designed for multi-valued data types including jsonb, arrays (text[], integer[]), and full-text tsvector. A GIN index on a column stores an inverted mapping from every element value (array element, JSON key-value pair, lexeme) to the rows containing it, enabling O(log n) lookups instead of O(n) sequential scans. For jsonb filter columns, a GIN index accelerates containment (@>), key-existence (?, ?|, ?&), and SQL/JSON path (@?, @@) queries. For text[] array columns, it accelerates @>, &&, and = operators. For tsvector full-text columns, it accelerates the @@ match operator. Create with: CREATE INDEX idx_name ON table USING GIN (column);. Two operator classes: jsonb_ops (default, all operators) and jsonb_path_ops (only @> and path operators, ~30% smaller index). GIN indexes have higher write overhead than B-tree indexes — balance index benefit against write throughput for high-insert workloads.
Full-Text Search
A search technique that tokenizes, stems, and indexes natural language text, enabling relevance-ranked search across document bodies — distinct from exact-match or range filtering. In PostgreSQL, full-text search uses two types: tsvector (a processed token representation of a document) and tsquery (a search query). The @@ operator matches a document against a query: to_tsvector('english', body) @@ plainto_tsquery('english', 'json filtering'). Processing functions: to_tsvector tokenizes and stems input, plainto_tsquery converts a plain string to an AND query, phraseto_tsquery requires exact phrase match, websearch_to_tsquery (PG 11+) parses Google-style syntax. GIN indexes on tsvector expressions accelerate @@ queries. In Elasticsearch, full-text search uses analyzed fields with match and multi_match queries — the analyzer pipeline tokenizes, lowercases, and stems text identically to how it was indexed, enabling language-aware matching. Always combine full-text search with structured filters (status, date range) for precise, relevant results.

FAQ

How do I design a JSON filter query format for a REST API?

There are four main approaches. Simple key=value query parameters (?status=published&year=2024) work for equality-only filters but cannot express ranges, negation, or OR logic. MongoDB-style JSON operators (?filter={"status":{"$eq":"published"}}) are expressive and widely understood — they support $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or, and $not. JSON:API bracket notation (?filter[status]=published&filter[year][gte]=2024) stays within URL query string conventions and composes with JSON:API pagination parameters. RQL (?q=and(eq(status,published),gt(year,2024))) is compact and powerful but requires a dedicated parser. For most APIs, MongoDB-style operators or JSON:API bracket notation offer the best balance of expressiveness and implementation simplicity. Whichever format you choose, document it in your OpenAPI spec with a full JSON Schema definition for the filter structure, so client developers know exactly what is valid.

How do I safely translate JSON filter parameters into SQL queries without SQL injection?

Use three layers of defense. First, validate and parse the incoming filter JSON with a strict schema (Zod with .strict() on all objects) that rejects unknown operators, oversized arrays, and malformed structures before any SQL is generated. Second, maintain a hardcoded allowlist of filterable field names — a compile-time constant object that maps client field names to SQL column references. When any filter field is not in the allowlist, reject the entire request with HTTP 400 — never fall through to a default. Third, use parameterized queries or a query builder (Knex, Drizzle, Kysely) for all filter values — every $eq, $gt, $in value must become a bound parameter ($1, $2), never an interpolated string. Test with filter payloads like {"1=1--": {"$eq": ""}} and {"status": {"$eq": "' OR '1'='1"}} to verify both defenses hold.

What are the MongoDB-style filter operators and how do I implement them in a REST API?

MongoDB-style operators use a $ prefix nested inside field objects. Comparison operators: $eq (SQL =), $ne (!=), $gt (>), $gte (>=), $lt (<), $lte (<=), $in (SQL IN), $nin (NOT IN), $exists (IS NOT NULL / IS NULL). Logical operators: $and (SQL AND), $or (OR), $not (NOT). To implement: parse the JSON filter, validate each operator against a Zod schema, map each field name to a SQL column via an allowlist, and build parameterized SQL using a query builder. The $in operator accepts arrays — limit to 100 values maximum for performance. Handle $and and $or recursively by iterating their sub-condition arrays and joining the translated SQL with AND/OR. Always wrap the translated filter in parentheses before appending to the WHERE clause to avoid operator precedence bugs.

How does JSON:API define filter query parameters?

The JSON:API specification reserves the filter query parameter family but leaves the internal syntax to the implementation. The most widely adopted convention is bracket notation: GET /articles?filter[status]=published for simple equality. For operators, a nested bracket: filter[viewCount][gte]=100 where the outer bracket is the field and the inner bracket is the operator. Multiple filters are implicitly ANDed. Array/multi-value: filter[tags][in][]=js&filter[tags][in][]=ts. Relationship filters use dot notation: filter[author.name]=Alice. In Express/Node.js, the qs library parses bracket notation into nested objects automatically — req.query.filter becomes {status: "published", viewCount: {gte: "100"}}. All values arrive as strings and must be cast to numbers for numeric columns. Combine with page[size] and page[number] for paginated filtered results. Return JSON:API error objects ({errors: [{detail: "..."} ]} for invalid filter fields or operators.

What fields should I allow as filterable and how do I validate filter field names?

Define the filterable field allowlist as a hardcoded compile-time constant — a TypeScript const object or a frozen Map. Never derive the allowlist from user input, environment configuration, or database metadata at runtime. Each entry maps a client-facing field name (the name clients put in their filter objects) to the exact SQL column reference (e.g., { authorId: "articles.author_id" }). When a filter request arrives, iterate each field in the filter object and look it up in the allowlist — if absent, immediately reject with HTTP 400. Only include fields that have database indexes: filtering on unindexed columns causes full table scans that degrade performance as the table grows. Avoid including PII fields (email, phone, SSN) in the filterable allowlist unless the API requires authentication and row-level authorization. For composite filter scenarios (a "search" filter that hits multiple columns), handle them as named cases in the translator rather than exposing column names directly.

How do I implement full-text search alongside JSON filter parameters?

Treat full-text search as a dedicated search parameter separate from the structured filter: GET /articles?search=json+filtering&filter[status]=published. The search term maps to a tsvector @@ tsquery SQL expression in PostgreSQL, not to a field comparison — build it with websearch_to_tsquery('english', $1) where $1 is the parameterized search string. Add a GIN index on the tsvector expression: CREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', title || ' ' || body)). In Elasticsearch, combine full-text in the bool.must clause (multi_match query) with structured filters in the bool.filter clause — filter clauses do not affect relevance scoring and are cached by Elasticsearch, making them significantly faster than putting conditions in must. Use websearch_to_tsquery (PostgreSQL 11+) instead of to_tsquery for user-facing search — it tolerates malformed input without throwing database errors.

How do I paginate results when the client is using complex JSON filters?

Prefer cursor-based pagination over offset pagination for filtered results on large tables. Offset pagination (LIMIT $size OFFSET $offset) requires a COUNT(*) query with the same WHERE clause for total page count — this count can be slower than the data query on millions of rows with complex filters. Cursor pagination encodes the last row's sort field values in a base64 token and uses a row-value WHERE condition (WHERE (published_at, id) < ($cursor_date, $cursor_id)) instead of OFFSET — each page jumps directly to the correct starting row using the sort index. Create a composite index covering both the filter field and the sort field: CREATE INDEX idx_articles_status_date ON articles (status, published_at DESC) — this allows index-only range scanning without a sort step. Limit $and/$or filter nesting depth (max 2 levels, max 10 conditions per level) to prevent exponential query plan complexity. Always require an explicit sort order for paginated requests — never paginate over an unordered result set.

Further reading and primary sources