JSON Deduplicate Array Objects JavaScript: Set, Map & lodash
Last updated:
Deduplicating a JSON array of primitives takes one line: [...new Set(array)] — but for arrays of objects, you need a key-based strategy: Map keyed by a unique field, filter + findIndex, or a reduce accumulator. The fastest approach for large object arrays is a Map: const seen = new Map(); const unique = array.filter(item => { if (seen.has(item.id)) return false; seen.set(item.id, true); return true; }) — O(n) time and O(n) space. filter + findIndex is simpler but O(n²) — avoid for arrays larger than ~1000 items. This guide covers Set-based primitive deduplication, Map-based object deduplication by key (O(n)), reduce accumulator pattern, deep equality deduplication with JSON.stringify(), lodash _.uniqBy() and _.uniqWith(), and database-level deduplication with SQL DISTINCT and PostgreSQL ON CONFLICT DO NOTHING.
Set-Based Deduplication for Primitive Arrays
[...new Set(array)] is the canonical one-liner for deduplicating arrays of primitives — strings, numbers, booleans, and null. Set stores only unique values using the SameValueZero algorithm, which compares primitives by value. The spread operator converts the Set back to an array, preserving insertion order (first occurrence wins). This runs in O(n) time and O(n) space — the theoretical minimum for a comparison-based deduplication that preserves order.
// ── Primitive array deduplication: O(n) ──────────────────────────
const numbers = [1, 2, 2, 3, 1, 4, 3, 5];
const uniqueNumbers = [...new Set(numbers)];
console.log(uniqueNumbers); // [1, 2, 3, 4, 5]
const strings = ['apple', 'banana', 'apple', 'cherry', 'banana'];
const uniqueStrings = [...new Set(strings)];
console.log(uniqueStrings); // ['apple', 'banana', 'cherry']
// Mixed primitives — null and undefined treated as unique values
const mixed = [1, '1', null, null, undefined, undefined, true, true];
const uniqueMixed = [...new Set(mixed)];
console.log(uniqueMixed); // [1, '1', null, undefined, true]
// Note: 1 (number) and '1' (string) are different values in SameValueZero
// NaN: Set treats NaN as equal to itself (unlike ===)
const withNaN = [NaN, NaN, 1, 1];
console.log([...new Set(withNaN)]); // [NaN, 1] — NaN deduplicated correctly
// Alternative: Array.from(new Set(array)) — identical result
const unique2 = Array.from(new Set(numbers));
console.log(unique2); // [1, 2, 3, 4, 5]
// ── Deduplication from JSON-parsed data ──────────────────────────
const jsonData = JSON.parse('[1, 2, 2, 3, "a", "a", null, null]');
const uniqueJson = [...new Set(jsonData)];
console.log(uniqueJson); // [1, 2, 3, 'a', null]
// ── Count unique values ───────────────────────────────────────────
const tags = ['js', 'ts', 'js', 'react', 'ts', 'node'];
const uniqueCount = new Set(tags).size;
console.log(uniqueCount); // 4
// ── Check if array has duplicates ─────────────────────────────────
const hasDuplicates = (arr) => new Set(arr).size !== arr.length;
console.log(hasDuplicates([1, 2, 3])); // false
console.log(hasDuplicates([1, 2, 2, 3])); // true
// ── Why Set fails for objects ─────────────────────────────────────
const objects = [{ id: 1 }, { id: 1 }, { id: 2 }];
const setResult = [...new Set(objects)];
console.log(setResult.length); // 3 — all three kept! Objects compared by reference
// Each object literal creates a new reference, so Set sees 3 distinct valuesSet handles NaN correctly for deduplication — NaN === NaN is false in JavaScript, but Set treats multiple NaN values as the same entry using SameValueZero. The practical implication: [...new Set([NaN, NaN])] correctly returns [NaN]. However, +0 and -0 are considered equal by SameValueZero, so [...new Set([0, -0])] returns [0]. For objects and arrays, Set compares by reference — two objects with identical content are treated as distinct values, which is why object deduplication requires a different approach.
Map-Based Object Deduplication by Key (O(n))
Map-based deduplication is the fastest O(n) approach for arrays of objects when a unique key field (like id) exists. The Map tracks which keys have been seen, and filter() keeps only the first occurrence of each key. A cleaner alternative uses the Map constructor directly: [...new Map(array.map(item => [item.id, item])).values()] — this keeps the last occurrence of each key (Map overwrites on duplicate keys).
// ── Map deduplication: keep first occurrence ─────────────────────
const users = [
{ id: 1, name: 'Alice', role: 'admin' },
{ id: 2, name: 'Bob', role: 'editor' },
{ id: 1, name: 'Alice', role: 'admin' }, // duplicate id:1
{ id: 3, name: 'Carol', role: 'viewer' },
{ id: 2, name: 'Bob', role: 'editor' }, // duplicate id:2
];
const seen = new Map();
const uniqueUsers = users.filter(user => {
if (seen.has(user.id)) return false;
seen.set(user.id, true);
return true;
});
// [{ id:1, name:'Alice' }, { id:2, name:'Bob' }, { id:3, name:'Carol' }]
// ── Map constructor: keep last occurrence ─────────────────────────
// new Map(array.map(item => [key, item])) overwrites on duplicate keys
const uniqueLast = [...new Map(users.map(u => [u.id, u])).values()];
// Keeps last occurrence of each id — useful for "latest wins" semantics
// ── Composite key deduplication ────────────────────────────────────
const events = [
{ type: 'click', target: 'button', timestamp: 1 },
{ type: 'click', target: 'button', timestamp: 2 }, // same type+target
{ type: 'hover', target: 'button', timestamp: 3 },
{ type: 'click', target: 'input', timestamp: 4 },
];
const seenEvents = new Map();
const uniqueEvents = events.filter(ev => {
const key = `${ev.type}:${ev.target}`;
if (seenEvents.has(key)) return false;
seenEvents.set(key, true);
return true;
});
// 3 events: click:button (first), hover:button, click:input
// ── Generic deduplication helper ──────────────────────────────────
function deduplicateBy(array, keyFn) {
const seen = new Map();
return array.filter(item => {
const key = keyFn(item);
if (seen.has(key)) return false;
seen.set(key, true);
return true;
});
}
const uniqueById = deduplicateBy(users, u => u.id);
const uniqueByEmail = deduplicateBy(users, u => u.email?.toLowerCase());
// ── Performance: 100,000 items ────────────────────────────────────
const largeArray = Array.from({ length: 100_000 }, (_, i) => ({
id: i % 10_000, // 10,000 unique ids, 10x duplication
value: `item-${i}`,
}));
console.time('Map dedup');
const uniqueLarge = [...new Map(largeArray.map(x => [x.id, x])).values()];
console.timeEnd('Map dedup');
// Map dedup: ~15ms for 100,000 items — O(n)
console.log(uniqueLarge.length); // 10,000The new Map(array.map(item => [item.id, item])).values() pattern is slightly faster in benchmarks than the filter() version because it avoids closure allocation per iteration. The trade-off is semantics: the Map constructor keeps the last occurrence of each key (later entries overwrite earlier ones), while the filter() version keeps the first. Choose based on your requirement — for "latest record wins" semantics (e.g., deduplicated event streams), the constructor form is ideal. For string keys derived from template literals, this approach handles composite keys cleanly without a separate serialization step.
filter + findIndex: Simple but O(n²)
array.filter((item, index) => array.findIndex(el => el.id === item.id) === index) is the most readable deduplication pattern and requires no imports — but it is O(n²) because findIndex scans the full array for each element. For arrays under ~500 items this is imperceptible; for 5,000+ items, the quadratic growth becomes noticeable; for 50,000+ items it will block the main thread.
// ── filter + findIndex: simplest approach, O(n²) ────────────────
const products = [
{ id: 'A', name: 'Widget', price: 9.99 },
{ id: 'B', name: 'Gadget', price: 19.99 },
{ id: 'A', name: 'Widget', price: 9.99 }, // duplicate
{ id: 'C', name: 'Doohickey', price: 4.99 },
];
// findIndex returns the FIRST index matching the predicate
// filter keeps items where their index === the first occurrence index
const unique = products.filter(
(item, index, arr) => arr.findIndex(el => el.id === item.id) === index
);
// [{ id:'A' }, { id:'B' }, { id:'C' }] — duplicates removed, first occurrence kept
// ── Deduplication by multiple properties ──────────────────────────
const orders = [
{ userId: 1, productId: 'A', qty: 2 },
{ userId: 1, productId: 'B', qty: 1 },
{ userId: 1, productId: 'A', qty: 2 }, // duplicate
];
const uniqueOrders = orders.filter(
(item, i, arr) =>
arr.findIndex(o => o.userId === item.userId && o.productId === item.productId) === i
);
// 2 orders: userId:1/productId:A (first), userId:1/productId:B
// ── Performance comparison ────────────────────────────────────────
function benchmark(label, fn, iterations = 3) {
const times = [];
for (let i = 0; i < iterations; i++) {
const start = performance.now();
fn();
times.push(performance.now() - start);
}
console.log(label, Math.min(...times).toFixed(1) + 'ms');
}
const testData = Array.from({ length: 5_000 }, (_, i) => ({
id: i % 1_000,
value: i,
}));
benchmark('filter+findIndex (O(n²))', () =>
testData.filter((x, i, a) => a.findIndex(y => y.id === x.id) === i)
);
// filter+findIndex (O(n²)): ~180ms for 5,000 items
benchmark('Map (O(n))', () =>
[...new Map(testData.map(x => [x.id, x])).values()]
);
// Map (O(n)): ~1.2ms for 5,000 items — 150x faster
// ── Safe threshold ────────────────────────────────────────────────
function deduplicateSafe(array, keyFn) {
if (array.length < 500) {
// OK to use findIndex for small arrays
return array.filter((item, i, arr) =>
arr.findIndex(el => keyFn(el) === keyFn(item)) === i
);
}
// Switch to Map for larger arrays
const seen = new Map();
return array.filter(item => {
const key = keyFn(item);
return seen.has(key) ? false : seen.set(key, true) && true;
});
}The filter + findIndex pattern has one advantage over Map-based deduplication: it requires no setup and works without understanding Map semantics. It is appropriate in utility scripts, one-off data migrations, and test fixtures where readability matters more than performance. In production application code processing user-supplied data, always use the Map approach — you cannot control how many items users will submit, and O(n²) growth with adversarial input creates denial-of-service risk in Node.js request handlers.
reduce() Accumulator Pattern
The reduce() accumulator pattern builds the deduplicated array explicitly, carrying a seen Set or Map through each iteration. It is more verbose than the Map constructor form but useful when you need to build complex accumulator state simultaneously — for example, deduplicating while also computing aggregates.
// ── reduce with Set: deduplicate primitives ───────────────────────
const nums = [1, 2, 2, 3, 1, 4];
const uniqueNums = nums.reduce((acc, n) => {
if (!acc.seen.has(n)) {
acc.seen.add(n);
acc.result.push(n);
}
return acc;
}, { seen: new Set(), result: [] }).result;
// [1, 2, 3, 4]
// ── reduce with Map: deduplicate objects by key ────────────────────
const items = [
{ id: 1, value: 'a' },
{ id: 2, value: 'b' },
{ id: 1, value: 'a' }, // duplicate
];
const { result: uniqueItems } = items.reduce(
(acc, item) => {
if (!acc.seen.has(item.id)) {
acc.seen.set(item.id, true);
acc.result.push(item);
}
return acc;
},
{ seen: new Map(), result: [] }
);
// [{ id:1, value:'a' }, { id:2, value:'b' }]
// ── reduce: deduplicate AND aggregate simultaneously ───────────────
const transactions = [
{ userId: 1, amount: 50, category: 'food' },
{ userId: 2, amount: 30, category: 'travel' },
{ userId: 1, amount: 20, category: 'food' }, // duplicate userId:1
{ userId: 3, amount: 10, category: 'food' },
];
// Get unique users AND total amount per user in one pass
const userSummary = transactions.reduce((acc, tx) => {
if (!acc.has(tx.userId)) {
acc.set(tx.userId, { userId: tx.userId, total: 0, count: 0 });
}
const user = acc.get(tx.userId);
user.total += tx.amount;
user.count += 1;
return acc;
}, new Map());
const uniqueUsers = [...userSummary.values()];
// [
// { userId: 1, total: 70, count: 2 },
// { userId: 2, total: 30, count: 1 },
// { userId: 3, total: 10, count: 1 },
// ]
// ── reduce: group-then-pick-one (keep latest) ──────────────────────
const events = [
{ id: 'e1', type: 'login', ts: 1000 },
{ id: 'e2', type: 'login', ts: 2000 }, // later login
{ id: 'e3', type: 'logout', ts: 3000 },
];
const latestByType = [...events.reduce((map, ev) => {
const existing = map.get(ev.type);
if (!existing || ev.ts > existing.ts) map.set(ev.type, ev);
return map;
}, new Map()).values()];
// [{ id:'e2', type:'login', ts:2000 }, { id:'e3', type:'logout', ts:3000 }]The reduce accumulator pattern shines when deduplication is one step of a multi-step transformation — grouping, aggregating, or picking a "winner" within each group. The "keep latest" variant (comparing a timestamp or version field within the accumulator) replaces a sort+deduplicate pipeline with a single O(n) pass. When you only need deduplication without aggregation, prefer the Map constructor form for conciseness. For complex pipelines combining deduplication with filtering and transformation, tools like JSON transform patterns offer structured approaches.
Deep Equality Deduplication with JSON.stringify()
When objects have no unique key field and you need structural equality, serialize each object to a JSON string and use the string as the deduplication key. JSON.stringify() produces the same string for objects with identical property values — but only when property enumeration order is consistent. For objects where property order may vary, sort the keys before stringifying.
// ── JSON.stringify deduplication: basic ──────────────────────────
const configs = [
{ env: 'prod', debug: false, port: 8080 },
{ env: 'dev', debug: true, port: 3000 },
{ env: 'prod', debug: false, port: 8080 }, // structural duplicate
];
const seen = new Set();
const uniqueConfigs = configs.filter(item => {
const key = JSON.stringify(item);
return seen.has(key) ? false : seen.add(key) && true;
});
// 2 configs: prod and dev
// ── Handle inconsistent property order ────────────────────────────
// Objects with same properties in different order stringify differently
const a = { x: 1, y: 2 };
const b = { y: 2, x: 1 }; // same content, different order
JSON.stringify(a); // '{"x":1,"y":2}'
JSON.stringify(b); // '{"y":2,"x":1}' — different string!
a_key === b_key; // false — would NOT be deduplicated
// Fix: sort keys before stringifying
function stableStringify(obj) {
return JSON.stringify(obj, Object.keys(obj).sort());
}
stableStringify(a); // '{"x":1,"y":2}'
stableStringify(b); // '{"x":1,"y":2}' — same string
const seenStable = new Set();
const uniqueStable = configs.filter(item => {
const key = stableStringify(item);
return seenStable.has(key) ? false : seenStable.add(key) && true;
});
// ── Limitations of JSON.stringify ────────────────────────────────
// 1. Functions are omitted (returns undefined for function values)
const withFn = { id: 1, fn: () => 'hello' };
JSON.stringify(withFn); // '{"id":1}' — fn is silently dropped
// 2. undefined values are omitted
const withUndef = { a: 1, b: undefined };
JSON.stringify(withUndef); // '{"a":1}' — b is dropped
// 3. Date instances become ISO strings — may not match Date objects
const d1 = { created: new Date('2026-01-01') };
const d2 = { created: '2026-01-01T00:00:00.000Z' };
JSON.stringify(d1); // '{"created":"2026-01-01T00:00:00.000Z"}'
JSON.stringify(d2); // '{"created":"2026-01-01T00:00:00.000Z"}'
// These WOULD be deduplicated — may or may not be correct behavior
// 4. Circular references throw TypeError
const circular = { id: 1 };
circular.self = circular;
// JSON.stringify(circular); // TypeError: Converting circular structure to JSON
// ── When to use JSON.stringify vs lodash _.isEqual ────────────────
// JSON.stringify: fast, built-in, works for plain JSON-safe objects
// _.isEqual: handles Date, RegExp, Map, Set, undefined, functions (by reference)
//
// Use JSON.stringify when: objects come from JSON.parse() or are guaranteed JSON-safe
// Use _.isEqual when: objects may contain Dates, undefined, or complex typesJSON.stringify() deduplication is appropriate when your objects originate from JSON.parse() — parsed JSON is always JSON-safe, contains no functions or circular references, and has deterministic property ordering from the parser. The key limitation is key ordering: if the same logical object reaches your code through different code paths that build the object in different orders, JSON.stringify() produces different strings and fails to deduplicate. The stable-stringify approach with sorted keys fixes this at the cost of slightly more processing. For JSON search and filter pipelines where objects are pre-parsed, JSON.stringify() deduplication is reliable and dependency-free.
lodash _.uniqBy() and _.uniqWith()
lodash provides three deduplication functions: _.uniq() for primitives (equivalent to [...new Set()]), _.uniqBy(array, iteratee) for key-based object deduplication, and _.uniqWith(array, comparator) for custom equality logic including deep equality. All three are O(n) and return new arrays without mutating the input.
import _ from 'lodash';
// or: import { uniq, uniqBy, uniqWith, isEqual } from 'lodash';
// or: import uniqBy from 'lodash/uniqBy'; // tree-shakeable
// ── _.uniq: primitives (same as [...new Set()]) ───────────────────
_.uniq([1, 2, 2, 3, 1]); // [1, 2, 3]
_.uniq(['a', 'b', 'a', 'c']); // ['a', 'b', 'c']
// ── _.uniqBy: key-based object deduplication ──────────────────────
const users = [
{ id: 1, name: 'Alice', dept: 'engineering' },
{ id: 2, name: 'Bob', dept: 'design' },
{ id: 1, name: 'Alice', dept: 'engineering' },
{ id: 3, name: 'Carol', dept: 'engineering' },
];
// By property name string
_.uniqBy(users, 'id');
// [{ id:1, name:'Alice' }, { id:2, name:'Bob' }, { id:3, name:'Carol' }]
// By function (transform before comparing)
_.uniqBy(users, u => u.name.toLowerCase());
// Deduplicates by lowercased name
// By dot-notation path (nested property)
const orders = [
{ id: 1, customer: { id: 10, name: 'Alice' } },
{ id: 2, customer: { id: 20, name: 'Bob' } },
{ id: 3, customer: { id: 10, name: 'Alice' } }, // same customer
];
_.uniqBy(orders, 'customer.id');
// [{ id:1, customer:{id:10} }, { id:2, customer:{id:20} }]
// By composite key via function
_.uniqBy(users, u => `${u.name}:${u.dept}`);
// ── _.uniqWith: custom comparator / deep equality ─────────────────
const configs = [
{ host: 'localhost', port: 3000 },
{ host: 'localhost', port: 4000 },
{ host: 'localhost', port: 3000 }, // deep equal to first
];
_.uniqWith(configs, _.isEqual);
// [{ host:'localhost', port:3000 }, { host:'localhost', port:4000 }]
// Custom comparator: deduplicate by single field using comparator form
_.uniqWith(users, (a, b) => a.dept === b.dept);
// Keeps first user from each department
// ── _.uniqWith with Date comparison ──────────────────────────────
const events = [
{ type: 'login', at: new Date('2026-01-01') },
{ type: 'login', at: new Date('2026-01-01') }, // same Date value
{ type: 'logout', at: new Date('2026-01-02') },
];
_.uniqWith(events, _.isEqual);
// 2 events — _.isEqual compares Date instances by value, not reference
// ── Tree-shakeable imports (lodash-es or lodash/function) ──────────
// For ESM / bundled applications, import individual functions:
import uniqBy from 'lodash/uniqBy';
import uniqWith from 'lodash/uniqWith';
import isEqual from 'lodash/isEqual';
const unique = uniqBy(users, 'id');
// ── _.sortedUniq: O(n) deduplication for sorted arrays ────────────
// When array is already sorted, adjacent comparison is enough
const sorted = [1, 1, 2, 3, 3, 3, 4];
_.sortedUniq(sorted); // [1, 2, 3, 4] — O(n), no hash map neededlodash's _.isEqual() performs structural deep equality that correctly handles Date instances (compared by value), RegExp (by source and flags), Map and Set (by entries), ArrayBuffer and typed arrays, and null/undefined. It does not compare functions by value (functions are compared by reference). For bundled applications, import individual functions from lodash/uniqBy rather than the full lodash package to reduce bundle size — or use lodash-es for ESM tree-shaking. The _.sortedUniqBy() variant works on pre-sorted arrays using binary comparison, using O(1) space instead of O(n). Pair with JSON performance techniques to minimize preprocessing costs.
Database-Level Deduplication: SQL DISTINCT and ON CONFLICT
Database-level deduplication is more reliable than application-level for large datasets and concurrent writes. SQL provides three main tools: SELECT DISTINCT for query-time deduplication, INSERT ... ON CONFLICT DO NOTHING for preventing duplicates during inserts, and ROW_NUMBER() OVER (PARTITION BY ...) for deduplicating existing table data. PostgreSQL JSONB columns add JSON-aware deduplication capabilities using JSONB operators inside these constructs.
-- ── SELECT DISTINCT: query-time deduplication ────────────────────
-- Remove duplicate rows from result set
SELECT DISTINCT name, email
FROM users
ORDER BY name;
-- DISTINCT ON (PostgreSQL-specific): keep first row per group
-- Returns one row per email, keeping the earliest-created user
SELECT DISTINCT ON (email)
id, name, email, created_at
FROM users
ORDER BY email, created_at ASC;
-- ── INSERT ON CONFLICT DO NOTHING: prevent duplicates ─────────────
-- Requires a UNIQUE constraint or PRIMARY KEY on the conflict column
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);
-- Insert from staging table, skip rows that conflict on email
INSERT INTO users (name, email, created_at)
SELECT name, email, NOW()
FROM staging_users
ON CONFLICT (email) DO NOTHING;
-- ON CONFLICT DO UPDATE (upsert): update existing row instead of skipping
INSERT INTO users (id, name, email)
VALUES ($1, $2, $3)
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name,
email = EXCLUDED.email,
updated_at = NOW();
-- ── ROW_NUMBER: deduplicate existing table data ───────────────────
-- Keep only the most recent row per email, delete the rest
WITH ranked AS (
SELECT
id,
email,
ROW_NUMBER() OVER (
PARTITION BY email -- group by email
ORDER BY created_at DESC -- keep the most recent
) AS rn
FROM users
)
DELETE FROM users
WHERE id IN (
SELECT id FROM ranked WHERE rn > 1
);
-- ── PostgreSQL JSONB: deduplicate JSON arrays in a column ──────────
-- users table: profile JSONB column containing { "tags": ["js","ts","js"] }
-- Deduplicate a JSONB array using jsonb_array_elements + DISTINCT
UPDATE users
SET profile = jsonb_set(
profile,
'{tags}',
(
SELECT jsonb_agg(DISTINCT tag)
FROM jsonb_array_elements_text(profile->'tags') AS tag
)
)
WHERE jsonb_array_length(profile->'tags') > 0;
-- ── Bulk deduplication with CTE: safe for large tables ────────────
-- Step 1: identify duplicates
WITH duplicates AS (
SELECT
email,
COUNT(*) AS cnt,
MIN(id) AS keep_id -- keep the row with lowest id
FROM users
GROUP BY email
HAVING COUNT(*) > 1
)
-- Step 2: delete all but the kept row
DELETE FROM users u
USING duplicates d
WHERE u.email = d.email
AND u.id <> d.keep_id;
-- ── Create UNIQUE INDEX to prevent future duplicates ──────────────
-- Partial unique index: unique email among active users only
CREATE UNIQUE INDEX users_active_email_idx
ON users (email)
WHERE deleted_at IS NULL;
-- For PostgreSQL JSONB: reference guide at /guides/json-postgresql
-- See also: /guides/json-performance for index strategyPostgreSQL's DISTINCT ON (column) is more powerful than standard SELECT DISTINCT — it returns exactly one row per unique value of the specified column, with the selected row determined by the ORDER BY clause. This is the idiomatic PostgreSQL way to "get the latest record per user" or "get the cheapest product per category." For ongoing deduplication, a UNIQUE constraint combined with ON CONFLICT DO NOTHING is the most reliable approach — it is atomic, handles concurrent inserts correctly, and eliminates the need for application-level duplicate checking. Pair with PostgreSQL JSONB for JSON column deduplication patterns.
Key Terms
- Set deduplication
- Using JavaScript's built-in
Setdata structure to remove duplicate values from an array.Setstores only unique values using SameValueZero equality, which compares primitives by value and objects by reference. The spread pattern[...new Set(array)]converts an array to a Set (eliminating duplicates) and back to an array, running in O(n) time and O(n) space. Set correctly deduplicatesNaN(unlike===) but cannot deduplicate objects by their content — two objects with identical properties are treated as distinct values because they have different memory addresses. Use Set for arrays of strings, numbers, booleans, andnull. - Map deduplication
- Using JavaScript's
Mapdata structure to deduplicate arrays of objects by a unique key field. The Map tracks which key values have been encountered;filter()then keeps only items whose key has not been seen. Runs in O(n) time and O(n) space — the theoretical optimum for order-preserving deduplication. The alternative[...new Map(array.map(item => [item.id, item])).values()]is more concise but keeps the last occurrence of each key (Map overwrites on duplicate keys). Suitable for any array where a unique identifier field exists — database IDs, UUIDs, or composite keys built with template literals. - filter+findIndex
- A deduplication pattern that combines
Array.prototype.filter()andArray.prototype.findIndex():array.filter((item, i, arr) => arr.findIndex(el => el.id === item.id) === i). For each element,findIndexscans the entire array for the first occurrence of that key — if the current index matches the first-occurrence index, the item is unique (or is the first occurrence). Runs in O(n²) time becausefindIndexis called once per element and scans up to n elements each time. Simple and readable, requiring no imports or additional variables, but unsuitable for arrays larger than ~1,000 items due to quadratic performance degradation. - deep equality
- Structural comparison of two values that recursively checks all nested properties and array elements for equality, rather than comparing object references. In JavaScript,
{ a: 1 } === { a: 1 }isfalse(reference inequality), but deep equality considers them equal. Two approaches:JSON.stringify()serializes both values and compares the resulting strings — fast and dependency-free for JSON-safe objects, but fails for objects containing functions,undefined, circular references,Dateobjects with inconsistent stringification, or inconsistent property ordering. lodash_.isEqual()performs true structural deep equality that handlesDate,RegExp,Map,Set, typed arrays, andundefinedcorrectly, at the cost of an external dependency. - lodash _.uniqBy()
- A lodash function that removes duplicate elements from an array using a key-generating iteratee, keeping the first occurrence of each unique key. Signature:
_.uniqBy(array, iteratee)where the iteratee is a property name string ('id'), a dot-notation path for nested properties ('user.id'), or a function (item => item.name.toLowerCase()). Internally uses aSet-based approach for O(n) performance. The companion_.uniqWith(array, comparator)accepts a custom two-argument comparison function instead of a key extractor — use_.uniqWith(arr, _.isEqual)for full deep equality deduplication. Both return new arrays and do not mutate the input. - ON CONFLICT DO NOTHING
- A PostgreSQL
INSERTclause that silently skips rows that would violate aUNIQUEconstraint or primary key. Syntax:INSERT INTO table (...) VALUES (...) ON CONFLICT (column) DO NOTHING. Requires a unique constraint on the conflict column(s) defined at the schema level. Handles concurrent inserts correctly — if two transactions try to insert the same value simultaneously, one succeeds and the other is silently skipped without an error. The alternativeON CONFLICT DO UPDATE SET ...(upsert) updates the existing row instead of skipping. Both are atomic and avoid the race condition of application-level "check then insert" patterns. Use withINSERT ... SELECTto bulk-deduplicate data from a staging table.
FAQ
How do I remove duplicates from a JSON array?
For arrays of primitives (strings, numbers), use [...new Set(array)] — O(n) and one line. For arrays of objects, Set does not work because objects are compared by reference, not value. Instead, use a Map keyed by a unique field: const seen = new Map(); const unique = array.filter(item => { if (seen.has(item.id)) return false; seen.set(item.id, true); return true; }). If objects have no unique field, serialize each object: const seen = new Set(); const unique = array.filter(item => { const key = JSON.stringify(item); return seen.has(key) ? false : seen.add(key) && true; }). For deep equality with complex objects (Dates, undefined), use _.uniqWith(array, _.isEqual) from lodash.
How do I deduplicate an array of JSON objects?
The fastest approach is a Map keyed by a unique property. Two patterns: (1) Filter with seen Map — keeps first occurrence: const seen = new Map(); array.filter(item => seen.has(item.id) ? false : seen.set(item.id, true) && true). (2) Map constructor — keeps last occurrence: [...new Map(array.map(item => [item.id, item])).values()]. Both run in O(n). For composite keys (multiple fields), use a template literal: `${item.name}:${item.type}` as the Map key. For nested property keys: item.address?.city. With lodash: _.uniqBy(array, 'id') for property name or _.uniqBy(array, item => item.user.id) for nested access.
What is the fastest way to deduplicate a large JSON array?
For primitive arrays: [...new Set(array)] — O(n), cannot be beaten asymptotically. For object arrays with a unique key: [...new Map(array.map(item => [item.id, item])).values()] — O(n) time and O(n) space. This processes 100,000 objects in under 20ms on modern hardware. Avoid filter + findIndex for large arrays — it is O(n²) and takes over 5 seconds for 10,000 items. For sorting-based deduplication when order does not matter and memory is constrained: sort the array first (O(n log n)) then compare adjacent items (O(n)) with O(1) extra space — total O(n log n). Never use nested loops or repeated indexOf calls — both are O(n²).
How do I deduplicate by a nested property?
Access the nested property in your key extractor. With a Map: const seen = new Map(); array.filter(item => { const key = item.address?.city; return seen.has(key) ? false : seen.set(key, true) && true; }). With lodash _.uniqBy(), pass a dot-notation string: _.uniqBy(array, 'address.city') — lodash parses the dot-notation and accesses the nested value automatically. For a function: _.uniqBy(array, item => item.address?.city). For composite nested keys: _.uniqBy(array, item => `${item.address.city}:${item.address.country}`). With optional chaining and nullish coalescing for safety: item?.address?.city ?? "__null__" — use a sentinel value to handle missing nested properties consistently.
How do I check deep equality for JSON deduplication?
Two approaches: (1) JSON.stringify() — fast and dependency-free for JSON-safe objects. Use as a Set key: const seen = new Set(); array.filter(item => { const key = JSON.stringify(item); return seen.has(key) ? false : seen.add(key) && true; }). If property order may vary, sort keys first: JSON.stringify(item, Object.keys(item).sort()). Limitations: fails for objects with functions, undefined values, circular references, or Symbol properties. (2) lodash _.isEqual() — handles Date, RegExp, Map, Set, typed arrays, and undefined: _.uniqWith(array, _.isEqual). Use JSON.stringify() when objects come from JSON.parse(); use _.isEqual when objects may contain non-JSON types.
How does lodash uniqBy work?
_.uniqBy(array, iteratee) iterates through the array, applies the iteratee to each element to generate a key, and tracks seen keys in a Set. The first element whose key has not been seen is kept; subsequent elements with the same key are discarded. The iteratee can be: a property name string ('id'), a dot-notation path ('user.id'), or a function (item => item.name.toLowerCase()). Performance is O(n). The companion _.uniqWith(array, comparator) accepts a two-argument function that returns true when two elements should be considered equal — use _.uniqWith(arr, _.isEqual) for deep structural equality. _.uniq(array) is equivalent to [...new Set(array)] for primitives. All three return new arrays without mutating the input.
How do I deduplicate JSON data at the database level?
Three approaches: (1) SELECT DISTINCT column1, column2 removes duplicate rows at query time. PostgreSQL's DISTINCT ON (email) returns one row per email value, with the selected row controlled by ORDER BY. (2) INSERT ... ON CONFLICT (column) DO NOTHING silently skips inserts that violate a UNIQUE constraint — requires the constraint to exist on the conflict column. (3) ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) numbers rows within each group; a subsequent DELETE WHERE rn > 1 removes all but the first. For PostgreSQL JSONB array columns, use jsonb_array_elements with DISTINCT to deduplicate JSON arrays stored in a column. Database-level deduplication handles concurrent writes atomically, which application-level deduplication cannot.
Why does Set not work for deduplicating JSON objects?
JavaScript Set uses SameValueZero equality, which compares objects by reference (memory address), not by content. Two distinct object literals with identical properties are different references: { id: 1 } === { id: 1 } is false. So new Set([{ id: 1 }, { id: 1 }]).size is 2, not 1 — both objects are kept because they live at different memory addresses. This is the same reason [] === [] is false. Primitives work correctly because numbers and strings are compared by value: new Set([1, 1, 2]).size is 2. To use Set with objects, serialize to a string key first: const seen = new Set(); array.filter(obj => { const k = JSON.stringify(obj); return seen.has(k) ? false : seen.add(k) && true; }). For unique-key deduplication, Map is faster and more explicit.
Further reading and primary sources
- MDN: Set — MDN reference for the JavaScript Set data structure including SameValueZero equality semantics
- MDN: Map — MDN reference for the JavaScript Map data structure used in O(n) object deduplication
- lodash _.uniqBy — lodash documentation for _.uniqBy(), _.uniqWith(), and _.uniq() deduplication functions
- PostgreSQL INSERT ON CONFLICT — PostgreSQL ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE (upsert) reference
- PostgreSQL Window Functions: ROW_NUMBER — PostgreSQL ROW_NUMBER() OVER (PARTITION BY ...) for deduplicating existing table rows