JSON Deduplicate Array Objects JavaScript: Set, Map & lodash

Q: How do I remove duplicates from a JSON array?

For arrays of primitives (strings, numbers), use the Set spread pattern: [...new Set(array)]. This is O(n) time and O(n) space — the fastest possible approach. For example: [...new Set([1, 2, 2, 3, 1])] returns [1, 2, 3]. For arrays of objects, Set does not work because objects are compared by reference, not by value. Instead, use a Map keyed by a unique field: const seen = new Map(); const unique = array.filter(item => { if (seen.has(item.id)) return false; seen.set(item.id, true); return true; }). This is also O(n). If you need to deduplicate without a unique key, use JSON.stringify() for deep equality comparison or lodash _.uniqWith(arr, _.isEqual).

Q: How do I deduplicate an array of JSON objects?

The fastest approach for large arrays is a Map keyed by a unique property: const seen = new Map(); const unique = array.filter(item => { if (seen.has(item.id)) return false; seen.set(item.id, true); return true; }). This runs in O(n) time and O(n) space. A cleaner one-liner using Map constructor chaining: [...new Map(array.map(item => [item.id, item])).values()] — this also deduplicates by id and keeps the last occurrence of each key. For deduplication by a composite key (multiple fields), use a template literal: seen.has(`${item.name}:${item.type}`). For deep equality without a unique key, use JSON.stringify() comparison or lodash _.uniqWith(arr, _.isEqual).

Q: What is the fastest way to deduplicate a large JSON array?

For primitive arrays: [...new Set(array)] — O(n) time and space, cannot be beaten asymptotically. For object arrays with a unique key: Map-based deduplication — O(n) time and O(n) space. The Map pattern using new Map(array.map(item => [item.id, item])).values() is slightly faster in practice because it avoids a closure in filter(). For arrays with 100,000+ items, the Map approach processes 100k objects in under 20ms on modern hardware. Avoid filter+findIndex for large arrays — it is O(n²) and takes over 5 seconds for 10,000 items. For sorting-based deduplication (when order does not matter), sorting first and then comparing adjacent items is O(n log n) but uses O(1) extra space. Never use nested loops for deduplication — they are O(n²).

Q: How do I check deep equality for JSON deduplication?

For JSON-safe objects (no functions, no undefined, no circular references, no Symbol values), use JSON.stringify() as the equality key: const seen = new Set(); const unique = array.filter(item => { const key = JSON.stringify(item); if (seen.has(key)) return false; seen.add(key); return true; }). This works because JSON.stringify produces the same string for objects with the same structure and values — but only if property order is consistent. If property order may vary, sort keys first: JSON.stringify(item, Object.keys(item).sort()). For objects with functions, Date instances, undefined values, or circular references, use lodash _.isEqual(): _.uniqWith(array, _.isEqual). _.isEqual performs structural deep equality and handles Date, RegExp, Map, Set, and typed arrays correctly.

Q: How does lodash uniqBy work?

_.uniqBy(array, iteratee) returns a new array with duplicates removed, keeping the first occurrence of each unique key. The iteratee can be a property name string (_.uniqBy(arr, "id")), a dot-notation path for nested properties (_.uniqBy(arr, "user.id")), or a function (_.uniqBy(arr, item => item.name.toLowerCase())). Internally, _.uniqBy uses a Map-based approach for O(n) performance. The companion function _.uniqWith(array, comparator) uses a custom comparison function instead of a key extractor — use _.uniqWith(arr, _.isEqual) for full deep equality deduplication. _.uniq(array) is the primitive version equivalent to [...new Set(array)]. All three functions return a new array and do not mutate the input. lodash also provides _.groupBy() for grouping rather than deduplicating, and _.sortedUniq() for pre-sorted arrays (O(n) without a hash map).

Q: How do I deduplicate JSON data at the database level?

Three database-level approaches: (1) SELECT DISTINCT eliminates duplicate rows: SELECT DISTINCT name, email FROM users — returns only unique combinations of the selected columns. (2) PostgreSQL ON CONFLICT DO NOTHING prevents inserting duplicates: INSERT INTO users (id, email) SELECT id, email FROM staging_users ON CONFLICT (email) DO NOTHING — requires a UNIQUE constraint on the conflict column. (3) PostgreSQL ON CONFLICT DO UPDATE (upsert) replaces the existing row: INSERT INTO users (id, email) VALUES ($1, $2) ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email. For deduplicating existing rows, use ROW_NUMBER(): DELETE FROM users WHERE id IN (SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn FROM users) t WHERE rn > 1) — keeps the most recent row per email. Database-level deduplication is more reliable than application-level for large datasets and concurrent writes.

Q: Why does Set not work for deduplicating JSON objects?

JavaScript Set uses the SameValueZero algorithm for equality, which compares objects by reference (memory address), not by value. Two distinct object literals with identical properties are different references: {} === {} is false, and new Set([{id:1}, {id:1}]).size is 2, not 1. This is fundamentally different from primitive equality: new Set([1, 1, 2]).size is 2 (correct). To deduplicate objects with Set, you would need to serialize them to a primitive key first — for example, using JSON.stringify as the set value: const seen = new Set(); array.filter(item => { const key = JSON.stringify(item); return seen.has(key) ? false : seen.add(key) && true; }). The Map-based approach is generally preferred over Set+JSON.stringify because it preserves the original object reference and avoids the serialization overhead when a unique key field already exists.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 20, 2026

Deduplicating a JSON array of primitives takes one line: [...new Set(array)] — but for arrays of objects, you need a key-based strategy: Map keyed by a unique field, filter + findIndex, or a reduce accumulator. The fastest approach for large object arrays is a Map: const seen = new Map(); const unique = array.filter(item => { if (seen.has(item.id)) return false; seen.set(item.id, true); return true; }) — O(n) time and O(n) space. filter + findIndex is simpler but O(n²) — avoid for arrays larger than ~1000 items. This guide covers Set-based primitive deduplication, Map-based object deduplication by key (O(n)), reduce accumulator pattern, deep equality deduplication with JSON.stringify(), lodash _.uniqBy() and _.uniqWith(), and database-level deduplication with SQL DISTINCT and PostgreSQL ON CONFLICT DO NOTHING.

Compare and Diff Your JSON Arrays Online

Paste two JSON arrays into our diff tool to instantly spot duplicates and identify the differences between datasets.

Open JSON Diff Tool

Set-Based Deduplication for Primitive Arrays

[...new Set(array)] is the canonical one-liner for deduplicating arrays of primitives — strings, numbers, booleans, and null. Set stores only unique values using the SameValueZero algorithm, which compares primitives by value. The spread operator converts the Set back to an array, preserving insertion order (first occurrence wins). This runs in O(n) time and O(n) space — the theoretical minimum for a comparison-based deduplication that preserves order.

// ── Primitive array deduplication: O(n) ──────────────────────────
const numbers = [1, 2, 2, 3, 1, 4, 3, 5];
const uniqueNumbers = [...new Set(numbers)];
console.log(uniqueNumbers); // [1, 2, 3, 4, 5]

const strings = ['apple', 'banana', 'apple', 'cherry', 'banana'];
const uniqueStrings = [...new Set(strings)];
console.log(uniqueStrings); // ['apple', 'banana', 'cherry']

// Mixed primitives — null and undefined treated as unique values
const mixed = [1, '1', null, null, undefined, undefined, true, true];
const uniqueMixed = [...new Set(mixed)];
console.log(uniqueMixed); // [1, '1', null, undefined, true]
// Note: 1 (number) and '1' (string) are different values in SameValueZero

// NaN: Set treats NaN as equal to itself (unlike ===)
const withNaN = [NaN, NaN, 1, 1];
console.log([...new Set(withNaN)]); // [NaN, 1]  — NaN deduplicated correctly

// Alternative: Array.from(new Set(array)) — identical result
const unique2 = Array.from(new Set(numbers));
console.log(unique2); // [1, 2, 3, 4, 5]

// ── Deduplication from JSON-parsed data ──────────────────────────
const jsonData = JSON.parse('[1, 2, 2, 3, "a", "a", null, null]');
const uniqueJson = [...new Set(jsonData)];
console.log(uniqueJson); // [1, 2, 3, 'a', null]

// ── Count unique values ───────────────────────────────────────────
const tags = ['js', 'ts', 'js', 'react', 'ts', 'node'];
const uniqueCount = new Set(tags).size;
console.log(uniqueCount); // 4

// ── Check if array has duplicates ─────────────────────────────────
const hasDuplicates = (arr) => new Set(arr).size !== arr.length;
console.log(hasDuplicates([1, 2, 3]));    // false
console.log(hasDuplicates([1, 2, 2, 3])); // true

// ── Why Set fails for objects ─────────────────────────────────────
const objects = [{ id: 1 }, { id: 1 }, { id: 2 }];
const setResult = [...new Set(objects)];
console.log(setResult.length); // 3 — all three kept! Objects compared by reference
// Each object literal creates a new reference, so Set sees 3 distinct values

Set handles NaN correctly for deduplication — NaN === NaN is false in JavaScript, but Set treats multiple NaN values as the same entry using SameValueZero. The practical implication: [...new Set([NaN, NaN])] correctly returns [NaN]. However, +0 and -0 are considered equal by SameValueZero, so [...new Set([0, -0])] returns [0]. For objects and arrays, Set compares by reference — two objects with identical content are treated as distinct values, which is why object deduplication requires a different approach.

Map-Based Object Deduplication by Key (O(n))

Map-based deduplication is the fastest O(n) approach for arrays of objects when a unique key field (like id) exists. The Map tracks which keys have been seen, and filter() keeps only the first occurrence of each key. A cleaner alternative uses the Map constructor directly: [...new Map(array.map(item => [item.id, item])).values()] — this keeps the last occurrence of each key (Map overwrites on duplicate keys).

// ── Map deduplication: keep first occurrence ─────────────────────
const users = [
  { id: 1, name: 'Alice', role: 'admin' },
  { id: 2, name: 'Bob',   role: 'editor' },
  { id: 1, name: 'Alice', role: 'admin' },  // duplicate id:1
  { id: 3, name: 'Carol', role: 'viewer' },
  { id: 2, name: 'Bob',   role: 'editor' },  // duplicate id:2
];

const seen = new Map();
const uniqueUsers = users.filter(user => {
  if (seen.has(user.id)) return false;
  seen.set(user.id, true);
  return true;
});
// [{ id:1, name:'Alice' }, { id:2, name:'Bob' }, { id:3, name:'Carol' }]

// ── Map constructor: keep last occurrence ─────────────────────────
// new Map(array.map(item => [key, item])) overwrites on duplicate keys
const uniqueLast = [...new Map(users.map(u => [u.id, u])).values()];
// Keeps last occurrence of each id — useful for "latest wins" semantics

// ── Composite key deduplication ────────────────────────────────────
const events = [
  { type: 'click', target: 'button', timestamp: 1 },
  { type: 'click', target: 'button', timestamp: 2 },  // same type+target
  { type: 'hover', target: 'button', timestamp: 3 },
  { type: 'click', target: 'input',  timestamp: 4 },
];

const seenEvents = new Map();
const uniqueEvents = events.filter(ev => {
  const key = `${ev.type}:${ev.target}`;
  if (seenEvents.has(key)) return false;
  seenEvents.set(key, true);
  return true;
});
// 3 events: click:button (first), hover:button, click:input

// ── Generic deduplication helper ──────────────────────────────────
function deduplicateBy(array, keyFn) {
  const seen = new Map();
  return array.filter(item => {
    const key = keyFn(item);
    if (seen.has(key)) return false;
    seen.set(key, true);
    return true;
  });
}

const uniqueById    = deduplicateBy(users, u => u.id);
const uniqueByEmail = deduplicateBy(users, u => u.email?.toLowerCase());

// ── Performance: 100,000 items ────────────────────────────────────
const largeArray = Array.from({ length: 100_000 }, (_, i) => ({
  id: i % 10_000,  // 10,000 unique ids, 10x duplication
  value: `item-${i}`,
}));

console.time('Map dedup');
const uniqueLarge = [...new Map(largeArray.map(x => [x.id, x])).values()];
console.timeEnd('Map dedup');
// Map dedup: ~15ms for 100,000 items — O(n)

console.log(uniqueLarge.length); // 10,000

The new Map(array.map(item => [item.id, item])).values() pattern is slightly faster in benchmarks than the filter() version because it avoids closure allocation per iteration. The trade-off is semantics: the Map constructor keeps the last occurrence of each key (later entries overwrite earlier ones), while the filter() version keeps the first. Choose based on your requirement — for "latest record wins" semantics (e.g., deduplicated event streams), the constructor form is ideal. For string keys derived from template literals, this approach handles composite keys cleanly without a separate serialization step.

filter + findIndex: Simple but O(n²)

array.filter((item, index) => array.findIndex(el => el.id === item.id) === index) is the most readable deduplication pattern and requires no imports — but it is O(n²) because findIndex scans the full array for each element. For arrays under ~500 items this is imperceptible; for 5,000+ items, the quadratic growth becomes noticeable; for 50,000+ items it will block the main thread.

// ── filter + findIndex: simplest approach, O(n²) ────────────────
const products = [
  { id: 'A', name: 'Widget',  price: 9.99 },
  { id: 'B', name: 'Gadget',  price: 19.99 },
  { id: 'A', name: 'Widget',  price: 9.99 },  // duplicate
  { id: 'C', name: 'Doohickey', price: 4.99 },
];

// findIndex returns the FIRST index matching the predicate
// filter keeps items where their index === the first occurrence index
const unique = products.filter(
  (item, index, arr) => arr.findIndex(el => el.id === item.id) === index
);
// [{ id:'A' }, { id:'B' }, { id:'C' }]  — duplicates removed, first occurrence kept

// ── Deduplication by multiple properties ──────────────────────────
const orders = [
  { userId: 1, productId: 'A', qty: 2 },
  { userId: 1, productId: 'B', qty: 1 },
  { userId: 1, productId: 'A', qty: 2 },  // duplicate
];

const uniqueOrders = orders.filter(
  (item, i, arr) =>
    arr.findIndex(o => o.userId === item.userId && o.productId === item.productId) === i
);
// 2 orders: userId:1/productId:A (first), userId:1/productId:B

// ── Performance comparison ────────────────────────────────────────
function benchmark(label, fn, iterations = 3) {
  const times = [];
  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    fn();
    times.push(performance.now() - start);
  }
  console.log(label, Math.min(...times).toFixed(1) + 'ms');
}

const testData = Array.from({ length: 5_000 }, (_, i) => ({
  id: i % 1_000,
  value: i,
}));

benchmark('filter+findIndex (O(n²))', () =>
  testData.filter((x, i, a) => a.findIndex(y => y.id === x.id) === i)
);
// filter+findIndex (O(n²)): ~180ms for 5,000 items

benchmark('Map (O(n))', () =>
  [...new Map(testData.map(x => [x.id, x])).values()]
);
// Map (O(n)): ~1.2ms for 5,000 items — 150x faster

// ── Safe threshold ────────────────────────────────────────────────
function deduplicateSafe(array, keyFn) {
  if (array.length < 500) {
    // OK to use findIndex for small arrays
    return array.filter((item, i, arr) =>
      arr.findIndex(el => keyFn(el) === keyFn(item)) === i
    );
  }
  // Switch to Map for larger arrays
  const seen = new Map();
  return array.filter(item => {
    const key = keyFn(item);
    return seen.has(key) ? false : seen.set(key, true) && true;
  });
}

The filter + findIndex pattern has one advantage over Map-based deduplication: it requires no setup and works without understanding Map semantics. It is appropriate in utility scripts, one-off data migrations, and test fixtures where readability matters more than performance. In production application code processing user-supplied data, always use the Map approach — you cannot control how many items users will submit, and O(n²) growth with adversarial input creates denial-of-service risk in Node.js request handlers.

reduce() Accumulator Pattern

The reduce() accumulator pattern builds the deduplicated array explicitly, carrying a seen Set or Map through each iteration. It is more verbose than the Map constructor form but useful when you need to build complex accumulator state simultaneously — for example, deduplicating while also computing aggregates.

// ── reduce with Set: deduplicate primitives ───────────────────────
const nums = [1, 2, 2, 3, 1, 4];
const uniqueNums = nums.reduce((acc, n) => {
  if (!acc.seen.has(n)) {
    acc.seen.add(n);
    acc.result.push(n);
  }
  return acc;
}, { seen: new Set(), result: [] }).result;
// [1, 2, 3, 4]

// ── reduce with Map: deduplicate objects by key ────────────────────
const items = [
  { id: 1, value: 'a' },
  { id: 2, value: 'b' },
  { id: 1, value: 'a' },  // duplicate
];

const { result: uniqueItems } = items.reduce(
  (acc, item) => {
    if (!acc.seen.has(item.id)) {
      acc.seen.set(item.id, true);
      acc.result.push(item);
    }
    return acc;
  },
  { seen: new Map(), result: [] }
);
// [{ id:1, value:'a' }, { id:2, value:'b' }]

// ── reduce: deduplicate AND aggregate simultaneously ───────────────
const transactions = [
  { userId: 1, amount: 50, category: 'food' },
  { userId: 2, amount: 30, category: 'travel' },
  { userId: 1, amount: 20, category: 'food' },   // duplicate userId:1
  { userId: 3, amount: 10, category: 'food' },
];

// Get unique users AND total amount per user in one pass
const userSummary = transactions.reduce((acc, tx) => {
  if (!acc.has(tx.userId)) {
    acc.set(tx.userId, { userId: tx.userId, total: 0, count: 0 });
  }
  const user = acc.get(tx.userId);
  user.total += tx.amount;
  user.count += 1;
  return acc;
}, new Map());

const uniqueUsers = [...userSummary.values()];
// [
//   { userId: 1, total: 70, count: 2 },
//   { userId: 2, total: 30, count: 1 },
//   { userId: 3, total: 10, count: 1 },
// ]

// ── reduce: group-then-pick-one (keep latest) ──────────────────────
const events = [
  { id: 'e1', type: 'login',  ts: 1000 },
  { id: 'e2', type: 'login',  ts: 2000 },  // later login
  { id: 'e3', type: 'logout', ts: 3000 },
];

const latestByType = [...events.reduce((map, ev) => {
  const existing = map.get(ev.type);
  if (!existing || ev.ts > existing.ts) map.set(ev.type, ev);
  return map;
}, new Map()).values()];
// [{ id:'e2', type:'login', ts:2000 }, { id:'e3', type:'logout', ts:3000 }]

The reduce accumulator pattern shines when deduplication is one step of a multi-step transformation — grouping, aggregating, or picking a "winner" within each group. The "keep latest" variant (comparing a timestamp or version field within the accumulator) replaces a sort+deduplicate pipeline with a single O(n) pass. When you only need deduplication without aggregation, prefer the Map constructor form for conciseness. For complex pipelines combining deduplication with filtering and transformation, tools like JSON transform patterns offer structured approaches.

Deep Equality Deduplication with JSON.stringify()

When objects have no unique key field and you need structural equality, serialize each object to a JSON string and use the string as the deduplication key. JSON.stringify() produces the same string for objects with identical property values — but only when property enumeration order is consistent. For objects where property order may vary, sort the keys before stringifying.

// ── JSON.stringify deduplication: basic ──────────────────────────
const configs = [
  { env: 'prod', debug: false, port: 8080 },
  { env: 'dev',  debug: true,  port: 3000 },
  { env: 'prod', debug: false, port: 8080 },  // structural duplicate
];

const seen = new Set();
const uniqueConfigs = configs.filter(item => {
  const key = JSON.stringify(item);
  return seen.has(key) ? false : seen.add(key) && true;
});
// 2 configs: prod and dev

// ── Handle inconsistent property order ────────────────────────────
// Objects with same properties in different order stringify differently
const a = { x: 1, y: 2 };
const b = { y: 2, x: 1 };  // same content, different order
JSON.stringify(a); // '{"x":1,"y":2}'
JSON.stringify(b); // '{"y":2,"x":1}'  — different string!
a_key === b_key;   // false — would NOT be deduplicated

// Fix: sort keys before stringifying
function stableStringify(obj) {
  return JSON.stringify(obj, Object.keys(obj).sort());
}
stableStringify(a); // '{"x":1,"y":2}'
stableStringify(b); // '{"x":1,"y":2}'  — same string

const seenStable = new Set();
const uniqueStable = configs.filter(item => {
  const key = stableStringify(item);
  return seenStable.has(key) ? false : seenStable.add(key) && true;
});

// ── Limitations of JSON.stringify ────────────────────────────────
// 1. Functions are omitted (returns undefined for function values)
const withFn = { id: 1, fn: () => 'hello' };
JSON.stringify(withFn); // '{"id":1}'  — fn is silently dropped

// 2. undefined values are omitted
const withUndef = { a: 1, b: undefined };
JSON.stringify(withUndef); // '{"a":1}'  — b is dropped

// 3. Date instances become ISO strings — may not match Date objects
const d1 = { created: new Date('2026-01-01') };
const d2 = { created: '2026-01-01T00:00:00.000Z' };
JSON.stringify(d1); // '{"created":"2026-01-01T00:00:00.000Z"}'
JSON.stringify(d2); // '{"created":"2026-01-01T00:00:00.000Z"}'
// These WOULD be deduplicated — may or may not be correct behavior

// 4. Circular references throw TypeError
const circular = { id: 1 };
circular.self = circular;
// JSON.stringify(circular); // TypeError: Converting circular structure to JSON

// ── When to use JSON.stringify vs lodash _.isEqual ────────────────
// JSON.stringify: fast, built-in, works for plain JSON-safe objects
// _.isEqual: handles Date, RegExp, Map, Set, undefined, functions (by reference)
//
// Use JSON.stringify when: objects come from JSON.parse() or are guaranteed JSON-safe
// Use _.isEqual when: objects may contain Dates, undefined, or complex types

JSON.stringify() deduplication is appropriate when your objects originate from JSON.parse() — parsed JSON is always JSON-safe, contains no functions or circular references, and has deterministic property ordering from the parser. The key limitation is key ordering: if the same logical object reaches your code through different code paths that build the object in different orders, JSON.stringify() produces different strings and fails to deduplicate. The stable-stringify approach with sorted keys fixes this at the cost of slightly more processing. For JSON search and filter pipelines where objects are pre-parsed, JSON.stringify() deduplication is reliable and dependency-free.

lodash _.uniqBy() and _.uniqWith()

lodash provides three deduplication functions: _.uniq() for primitives (equivalent to [...new Set()]), _.uniqBy(array, iteratee) for key-based object deduplication, and _.uniqWith(array, comparator) for custom equality logic including deep equality. All three are O(n) and return new arrays without mutating the input.

import _ from 'lodash';
// or: import { uniq, uniqBy, uniqWith, isEqual } from 'lodash';
// or: import uniqBy from 'lodash/uniqBy';  // tree-shakeable

// ── _.uniq: primitives (same as [...new Set()]) ───────────────────
_.uniq([1, 2, 2, 3, 1]);       // [1, 2, 3]
_.uniq(['a', 'b', 'a', 'c']); // ['a', 'b', 'c']

// ── _.uniqBy: key-based object deduplication ──────────────────────
const users = [
  { id: 1, name: 'Alice', dept: 'engineering' },
  { id: 2, name: 'Bob',   dept: 'design' },
  { id: 1, name: 'Alice', dept: 'engineering' },
  { id: 3, name: 'Carol', dept: 'engineering' },
];

// By property name string
_.uniqBy(users, 'id');
// [{ id:1, name:'Alice' }, { id:2, name:'Bob' }, { id:3, name:'Carol' }]

// By function (transform before comparing)
_.uniqBy(users, u => u.name.toLowerCase());
// Deduplicates by lowercased name

// By dot-notation path (nested property)
const orders = [
  { id: 1, customer: { id: 10, name: 'Alice' } },
  { id: 2, customer: { id: 20, name: 'Bob' } },
  { id: 3, customer: { id: 10, name: 'Alice' } },  // same customer
];
_.uniqBy(orders, 'customer.id');
// [{ id:1, customer:{id:10} }, { id:2, customer:{id:20} }]

// By composite key via function
_.uniqBy(users, u => `${u.name}:${u.dept}`);

// ── _.uniqWith: custom comparator / deep equality ─────────────────
const configs = [
  { host: 'localhost', port: 3000 },
  { host: 'localhost', port: 4000 },
  { host: 'localhost', port: 3000 },  // deep equal to first
];

_.uniqWith(configs, _.isEqual);
// [{ host:'localhost', port:3000 }, { host:'localhost', port:4000 }]

// Custom comparator: deduplicate by single field using comparator form
_.uniqWith(users, (a, b) => a.dept === b.dept);
// Keeps first user from each department

// ── _.uniqWith with Date comparison ──────────────────────────────
const events = [
  { type: 'login', at: new Date('2026-01-01') },
  { type: 'login', at: new Date('2026-01-01') },  // same Date value
  { type: 'logout', at: new Date('2026-01-02') },
];
_.uniqWith(events, _.isEqual);
// 2 events — _.isEqual compares Date instances by value, not reference

// ── Tree-shakeable imports (lodash-es or lodash/function) ──────────
// For ESM / bundled applications, import individual functions:
import uniqBy from 'lodash/uniqBy';
import uniqWith from 'lodash/uniqWith';
import isEqual from 'lodash/isEqual';

const unique = uniqBy(users, 'id');

// ── _.sortedUniq: O(n) deduplication for sorted arrays ────────────
// When array is already sorted, adjacent comparison is enough
const sorted = [1, 1, 2, 3, 3, 3, 4];
_.sortedUniq(sorted); // [1, 2, 3, 4]  — O(n), no hash map needed

lodash's _.isEqual() performs structural deep equality that correctly handles Date instances (compared by value), RegExp (by source and flags), Map and Set (by entries), ArrayBuffer and typed arrays, and null/undefined. It does not compare functions by value (functions are compared by reference). For bundled applications, import individual functions from lodash/uniqBy rather than the full lodash package to reduce bundle size — or use lodash-es for ESM tree-shaking. The _.sortedUniqBy() variant works on pre-sorted arrays using binary comparison, using O(1) space instead of O(n). Pair with JSON performance techniques to minimize preprocessing costs.

Database-Level Deduplication: SQL DISTINCT and ON CONFLICT

Database-level deduplication is more reliable than application-level for large datasets and concurrent writes. SQL provides three main tools: SELECT DISTINCT for query-time deduplication, INSERT ... ON CONFLICT DO NOTHING for preventing duplicates during inserts, and ROW_NUMBER() OVER (PARTITION BY ...) for deduplicating existing table data. PostgreSQL JSONB columns add JSON-aware deduplication capabilities using JSONB operators inside these constructs.

-- ── SELECT DISTINCT: query-time deduplication ────────────────────
-- Remove duplicate rows from result set
SELECT DISTINCT name, email
FROM users
ORDER BY name;

-- DISTINCT ON (PostgreSQL-specific): keep first row per group
-- Returns one row per email, keeping the earliest-created user
SELECT DISTINCT ON (email)
  id, name, email, created_at
FROM users
ORDER BY email, created_at ASC;

-- ── INSERT ON CONFLICT DO NOTHING: prevent duplicates ─────────────
-- Requires a UNIQUE constraint or PRIMARY KEY on the conflict column
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);

-- Insert from staging table, skip rows that conflict on email
INSERT INTO users (name, email, created_at)
SELECT name, email, NOW()
FROM staging_users
ON CONFLICT (email) DO NOTHING;

-- ON CONFLICT DO UPDATE (upsert): update existing row instead of skipping
INSERT INTO users (id, name, email)
VALUES ($1, $2, $3)
ON CONFLICT (id) DO UPDATE SET
  name  = EXCLUDED.name,
  email = EXCLUDED.email,
  updated_at = NOW();

-- ── ROW_NUMBER: deduplicate existing table data ───────────────────
-- Keep only the most recent row per email, delete the rest
WITH ranked AS (
  SELECT
    id,
    email,
    ROW_NUMBER() OVER (
      PARTITION BY email      -- group by email
      ORDER BY created_at DESC -- keep the most recent
    ) AS rn
  FROM users
)
DELETE FROM users
WHERE id IN (
  SELECT id FROM ranked WHERE rn > 1
);

-- ── PostgreSQL JSONB: deduplicate JSON arrays in a column ──────────
-- users table: profile JSONB column containing { "tags": ["js","ts","js"] }

-- Deduplicate a JSONB array using jsonb_array_elements + DISTINCT
UPDATE users
SET profile = jsonb_set(
  profile,
  '{tags}',
  (
    SELECT jsonb_agg(DISTINCT tag)
    FROM jsonb_array_elements_text(profile->'tags') AS tag
  )
)
WHERE jsonb_array_length(profile->'tags') > 0;

-- ── Bulk deduplication with CTE: safe for large tables ────────────
-- Step 1: identify duplicates
WITH duplicates AS (
  SELECT
    email,
    COUNT(*) AS cnt,
    MIN(id)  AS keep_id  -- keep the row with lowest id
  FROM users
  GROUP BY email
  HAVING COUNT(*) > 1
)
-- Step 2: delete all but the kept row
DELETE FROM users u
USING duplicates d
WHERE u.email = d.email
  AND u.id <> d.keep_id;

-- ── Create UNIQUE INDEX to prevent future duplicates ──────────────
-- Partial unique index: unique email among active users only
CREATE UNIQUE INDEX users_active_email_idx
  ON users (email)
  WHERE deleted_at IS NULL;

-- For PostgreSQL JSONB: reference guide at /guides/json-postgresql
-- See also: /guides/json-performance for index strategy

PostgreSQL's DISTINCT ON (column) is more powerful than standard SELECT DISTINCT — it returns exactly one row per unique value of the specified column, with the selected row determined by the ORDER BY clause. This is the idiomatic PostgreSQL way to "get the latest record per user" or "get the cheapest product per category." For ongoing deduplication, a UNIQUE constraint combined with ON CONFLICT DO NOTHING is the most reliable approach — it is atomic, handles concurrent inserts correctly, and eliminates the need for application-level duplicate checking. Pair with PostgreSQL JSONB for JSON column deduplication patterns.

Key Terms

Set deduplication: Using JavaScript's built-in Set data structure to remove duplicate values from an array. Set stores only unique values using SameValueZero equality, which compares primitives by value and objects by reference. The spread pattern [...new Set(array)] converts an array to a Set (eliminating duplicates) and back to an array, running in O(n) time and O(n) space. Set correctly deduplicates NaN (unlike ===) but cannot deduplicate objects by their content — two objects with identical properties are treated as distinct values because they have different memory addresses. Use Set for arrays of strings, numbers, booleans, and null.
Map deduplication: Using JavaScript's Map data structure to deduplicate arrays of objects by a unique key field. The Map tracks which key values have been encountered; filter() then keeps only items whose key has not been seen. Runs in O(n) time and O(n) space — the theoretical optimum for order-preserving deduplication. The alternative [...new Map(array.map(item => [item.id, item])).values()] is more concise but keeps the last occurrence of each key (Map overwrites on duplicate keys). Suitable for any array where a unique identifier field exists — database IDs, UUIDs, or composite keys built with template literals.
filter+findIndex: A deduplication pattern that combines Array.prototype.filter() and Array.prototype.findIndex(): array.filter((item, i, arr) => arr.findIndex(el => el.id === item.id) === i). For each element, findIndex scans the entire array for the first occurrence of that key — if the current index matches the first-occurrence index, the item is unique (or is the first occurrence). Runs in O(n²) time because findIndex is called once per element and scans up to n elements each time. Simple and readable, requiring no imports or additional variables, but unsuitable for arrays larger than ~1,000 items due to quadratic performance degradation.
deep equality: Structural comparison of two values that recursively checks all nested properties and array elements for equality, rather than comparing object references. In JavaScript, { a: 1 } === { a: 1 } is false (reference inequality), but deep equality considers them equal. Two approaches: JSON.stringify() serializes both values and compares the resulting strings — fast and dependency-free for JSON-safe objects, but fails for objects containing functions, undefined, circular references, Date objects with inconsistent stringification, or inconsistent property ordering. lodash _.isEqual() performs true structural deep equality that handles Date, RegExp, Map, Set, typed arrays, and undefined correctly, at the cost of an external dependency.
lodash _.uniqBy(): A lodash function that removes duplicate elements from an array using a key-generating iteratee, keeping the first occurrence of each unique key. Signature: _.uniqBy(array, iteratee) where the iteratee is a property name string ('id'), a dot-notation path for nested properties ('user.id'), or a function (item => item.name.toLowerCase()). Internally uses a Set-based approach for O(n) performance. The companion _.uniqWith(array, comparator) accepts a custom two-argument comparison function instead of a key extractor — use _.uniqWith(arr, _.isEqual) for full deep equality deduplication. Both return new arrays and do not mutate the input.
ON CONFLICT DO NOTHING: A PostgreSQL INSERT clause that silently skips rows that would violate a UNIQUE constraint or primary key. Syntax: INSERT INTO table (...) VALUES (...) ON CONFLICT (column) DO NOTHING. Requires a unique constraint on the conflict column(s) defined at the schema level. Handles concurrent inserts correctly — if two transactions try to insert the same value simultaneously, one succeeds and the other is silently skipped without an error. The alternative ON CONFLICT DO UPDATE SET ... (upsert) updates the existing row instead of skipping. Both are atomic and avoid the race condition of application-level "check then insert" patterns. Use with INSERT ... SELECT to bulk-deduplicate data from a staging table.

FAQ

How do I remove duplicates from a JSON array?

For arrays of primitives (strings, numbers), use [...new Set(array)] — O(n) and one line. For arrays of objects, Set does not work because objects are compared by reference, not value. Instead, use a Map keyed by a unique field: const seen = new Map(); const unique = array.filter(item => { if (seen.has(item.id)) return false; seen.set(item.id, true); return true; }). If objects have no unique field, serialize each object: const seen = new Set(); const unique = array.filter(item => { const key = JSON.stringify(item); return seen.has(key) ? false : seen.add(key) && true; }). For deep equality with complex objects (Dates, undefined), use _.uniqWith(array, _.isEqual) from lodash.

How do I deduplicate an array of JSON objects?

The fastest approach is a Map keyed by a unique property. Two patterns: (1) Filter with seen Map — keeps first occurrence: const seen = new Map(); array.filter(item => seen.has(item.id) ? false : seen.set(item.id, true) && true). (2) Map constructor — keeps last occurrence: [...new Map(array.map(item => [item.id, item])).values()]. Both run in O(n). For composite keys (multiple fields), use a template literal: `${item.name}:${item.type}` as the Map key. For nested property keys: item.address?.city. With lodash: _.uniqBy(array, 'id') for property name or _.uniqBy(array, item => item.user.id) for nested access.

What is the fastest way to deduplicate a large JSON array?

For primitive arrays: [...new Set(array)] — O(n), cannot be beaten asymptotically. For object arrays with a unique key: [...new Map(array.map(item => [item.id, item])).values()] — O(n) time and O(n) space. This processes 100,000 objects in under 20ms on modern hardware. Avoid filter + findIndex for large arrays — it is O(n²) and takes over 5 seconds for 10,000 items. For sorting-based deduplication when order does not matter and memory is constrained: sort the array first (O(n log n)) then compare adjacent items (O(n)) with O(1) extra space — total O(n log n). Never use nested loops or repeated indexOf calls — both are O(n²).

How do I deduplicate by a nested property?

Access the nested property in your key extractor. With a Map: const seen = new Map(); array.filter(item => { const key = item.address?.city; return seen.has(key) ? false : seen.set(key, true) && true; }). With lodash _.uniqBy(), pass a dot-notation string: _.uniqBy(array, 'address.city') — lodash parses the dot-notation and accesses the nested value automatically. For a function: _.uniqBy(array, item => item.address?.city). For composite nested keys: _.uniqBy(array, item => `${item.address.city}:${item.address.country}`). With optional chaining and nullish coalescing for safety: item?.address?.city ?? "__null__" — use a sentinel value to handle missing nested properties consistently.

How do I check deep equality for JSON deduplication?

Two approaches: (1) JSON.stringify() — fast and dependency-free for JSON-safe objects. Use as a Set key: const seen = new Set(); array.filter(item => { const key = JSON.stringify(item); return seen.has(key) ? false : seen.add(key) && true; }). If property order may vary, sort keys first: JSON.stringify(item, Object.keys(item).sort()). Limitations: fails for objects with functions, undefined values, circular references, or Symbol properties. (2) lodash _.isEqual() — handles Date, RegExp, Map, Set, typed arrays, and undefined: _.uniqWith(array, _.isEqual). Use JSON.stringify() when objects come from JSON.parse(); use _.isEqual when objects may contain non-JSON types.

How does lodash uniqBy work?

_.uniqBy(array, iteratee) iterates through the array, applies the iteratee to each element to generate a key, and tracks seen keys in a Set. The first element whose key has not been seen is kept; subsequent elements with the same key are discarded. The iteratee can be: a property name string ('id'), a dot-notation path ('user.id'), or a function (item => item.name.toLowerCase()). Performance is O(n). The companion _.uniqWith(array, comparator) accepts a two-argument function that returns true when two elements should be considered equal — use _.uniqWith(arr, _.isEqual) for deep structural equality. _.uniq(array) is equivalent to [...new Set(array)] for primitives. All three return new arrays without mutating the input.

How do I deduplicate JSON data at the database level?

Three approaches: (1) SELECT DISTINCT column1, column2 removes duplicate rows at query time. PostgreSQL's DISTINCT ON (email) returns one row per email value, with the selected row controlled by ORDER BY. (2) INSERT ... ON CONFLICT (column) DO NOTHING silently skips inserts that violate a UNIQUE constraint — requires the constraint to exist on the conflict column. (3) ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) numbers rows within each group; a subsequent DELETE WHERE rn > 1 removes all but the first. For PostgreSQL JSONB array columns, use jsonb_array_elements with DISTINCT to deduplicate JSON arrays stored in a column. Database-level deduplication handles concurrent writes atomically, which application-level deduplication cannot.

Why does Set not work for deduplicating JSON objects?

JavaScript Set uses SameValueZero equality, which compares objects by reference (memory address), not by content. Two distinct object literals with identical properties are different references: { id: 1 } === { id: 1 } is false. So new Set([{ id: 1 }, { id: 1 }]).size is 2, not 1 — both objects are kept because they live at different memory addresses. This is the same reason [] === [] is false. Primitives work correctly because numbers and strings are compared by value: new Set([1, 1, 2]).size is 2. To use Set with objects, serialize to a string key first: const seen = new Set(); array.filter(obj => { const k = JSON.stringify(obj); return seen.has(k) ? false : seen.add(k) && true; }). For unique-key deduplication, Map is faster and more explicit.