JSON Canonicalization (JCS, RFC 8785): Deterministic Serialization for Signatures

Q: How do I canonicalize JSON in Python?

In Python, the json module's json.dumps() function accepts a sort_keys=True parameter that sorts object keys lexicographically. For basic RFC 8785 compliance with ASCII keys: import json; canonical = json.dumps(value, sort_keys=True, separators=(",", ":"), ensure_ascii=False). The separators=(",",":") argument removes whitespace. However, json.dumps with sort_keys=True uses Python's default string sort (lexicographic by Unicode code point for ASCII strings), which matches JCS for ASCII keys. For full RFC 8785 compliance including correct number serialization and non-ASCII key ordering, install the canonicaljson package: pip install canonicaljson; import canonicaljson; data = canonicaljson.encode_canonical_json(value). The canonicaljson package is used by the Matrix protocol for event signing and handles all edge cases of the RFC 8785 number serialization algorithm.

Q: When should I use canonical JSON vs regular JSON?

Use canonical JSON (RFC 8785 / JCS) whenever you need byte-for-byte reproducibility of a JSON value across different systems, libraries, or languages. The primary use cases are: (1) Cryptographic signatures — JWS detached payloads, JWT claims sets that need signing, CBOR Object Signing; (2) Content-addressed storage — computing a stable hash that serves as a unique identifier for a JSON document (e.g., IPFS, blockchain transaction hashes); (3) Deterministic testing — comparing expected vs actual JSON output in tests without being sensitive to key ordering; (4) Caching keys — using a JSON object as a cache key requires a stable serialization. Use regular JSON for all other cases: REST API responses, configuration files, data storage, and inter-service communication where you are parsing the JSON rather than comparing its bytes. Canonical JSON has a small performance cost (key sorting is O(n log n)) and is not human-readable-friendly since it has no whitespace.

JSON Canonicalization Scheme (JCS, RFC 8785) produces a deterministic, byte-for-byte identical JSON serialization — essential for cryptographic signatures, because {"a":1,"b":2} and {"b":2,"a":1} are semantically identical but hash to different values without canonicalization. RFC 8785, published August 2020, is the first standardized canonical JSON format. The 3 core JCS rules are: object keys must be sorted by Unicode code point order (not alphabetical); numbers use IEEE 754 double-precision serialization with no trailing zeros or scientific notation for integers; and no whitespace is allowed between tokens. This guide covers the full JCS algorithm, implementing canonicalization in JavaScript and Python, using the canonicalize npm package, verifying JSON signatures, and comparing JCS with JSON Normalization Form (JNF). To validate that your JSON is well-formed before canonicalizing it, use Jsonic's JSON formatter.

Need to format or validate JSON before canonicalizing? Jsonic's formatter catches syntax errors instantly.

Open JSON Formatter

The JCS algorithm: 4 rules that define canonical JSON

JCS (RFC 8785) transforms any valid JSON value into a unique byte sequence by applying 4 deterministic rules. All 4 rules must be applied together — a correct implementation has zero configuration options, because any choice would break determinism.

Rule	Requirement	Example
Key ordering	Object keys sorted by UTF-16 code unit sequence	`{"Z":1,"a":2}` — "Z" (U+005A) before "a" (U+0061)
Number serialization	IEEE 754 double; integers as integers; no trailing zeros	`1.5e10` → `15000000000`
No whitespace	No spaces, tabs, or newlines between tokens	`{"a":1}` not `{ "a": 1 }`
String escaping	Control characters (U+0000–U+001F) use Unicode escape; others unescaped	Tab → `\t`, newline → `\n`

The algorithm is applied recursively: for objects, sort the keys then recurse into each value; for arrays, preserve element order (arrays are ordered by definition) and recurse into each element; for primitives, apply the number or string rule directly. Here is the pseudocode:

// JCS (RFC 8785) pseudocode
function canonicalize(value):
  if value is null, true, or false:
    return JSON representation ("null", "true", "false")

  if value is a number:
    return ieee754Serialize(value)       // see Section 2 for details

  if value is a string:
    return escapeString(value)           // control chars only

  if value is an array:
    elements = [canonicalize(el) for el in value]
    return "[" + join(elements, ",") + "]"

  if value is an object:
    sortedKeys = value.keys() sorted by UTF-16 code unit sequence
    pairs = [canonicalize(k) + ":" + canonicalize(value[k]) for k in sortedKeys]
    return "{" + join(pairs, ",") + "}"

Number serialization: the hardest part of RFC 8785

Number serialization is the most technically demanding rule in JCS. The goal is a unique decimal representation of an IEEE 754 double-precision floating-point value — specifically, the shortest decimal string that round-trips back to the same double. RFC 8785 adopts the ES2019 Number::toString algorithm (also known as Grisu3/Dragon4 or Ryu), which produces 17 significant digits at most. 3 specific rules apply:

Integers serialize as integers: if a double has no fractional part and fits in the safe integer range, serialize without a decimal point or exponent.15000000000.0 → 15000000000.
Scientific notation only when necessary: for very large or very small numbers that cannot be represented compactly without it.5e-324 (the smallest positive double) serializes as 5e-324.
No trailing zeros: 1.50 → 1.5, 1.0 → 1.

// Number serialization examples (RFC 8785)
Input            → Canonical output
-----------------------------------------------------------------
1                → 1
1.0              → 1
1.5              → 1.5
1.5e10           → 15000000000
1.5e-10          → 1.5e-10
0.1 + 0.2        → 0.30000000000000004  (IEEE 754 reality)
1e21             → 1e+21
Number.MAX_VALUE → 1.7976931348623157e+308
-Infinity        → ⚠ not allowed in JSON (must be excluded)
NaN              → ⚠ not allowed in JSON (must be excluded)

The 0.1 + 0.2 example is important: JCS canonicalizes the actual floating-point value, not the value you intended. If your data has floating-point precision issues, they will be locked into the canonical form. For financial or scientific applications where precision matters, store numbers as strings in JSON and use a decimal library — see JSON number precision for a full discussion.

Infinity and NaN are not valid JSON values. RFC 8785 requires implementations to reject them; a canonical JSON serializer must throw an error rather than silently producing null or a string representation.

Canonicalize JSON in JavaScript with the canonicalize package

The canonicalize npm package (MIT license) is the reference JavaScript implementation of RFC 8785. It has 0 dependencies and works in both Node.js and browsers. It is a drop-in replacement for JSON.stringify — call it with a value and get back the canonical string.

npm install canonicalize

import canonicalize from 'canonicalize'

// Basic usage
const obj = { b: 2, a: 1 }
const canonical = canonicalize(obj)
console.log(canonical)          // '{"a":1,"b":2}'

// Nested objects — keys sorted at every level
const nested = { z: { y: 3, x: 2 }, a: 1 }
console.log(canonicalize(nested))
// '{"a":1,"z":{"x":2,"y":3}}'

// Arrays — order preserved
const arr = { b: [3, 1, 2], a: 0 }
console.log(canonicalize(arr))
// '{"a":0,"b":[3,1,2]}'

// Signing: compute SHA-256 over the canonical UTF-8 bytes
import crypto from 'crypto'
const hash = crypto
  .createHash('sha256')
  .update(canonical, 'utf8')
  .digest('hex')
console.log(hash)               // deterministic across all RFC 8785 implementations

If you need to implement RFC 8785 from scratch (without a dependency), the key insight is that JavaScript's JSON.stringify already handles number serialization correctly (it uses the same ES2019 algorithm that RFC 8785 requires) and escapes control characters. The only missing piece is key sorting, which you can add with a recursive replacer:

// Minimal RFC 8785-compliant canonicalize (JavaScript, no dependencies)
// Handles key sorting; relies on JSON.stringify for number/string rules.
function canonicalize(value) {
  if (Array.isArray(value)) {
    return '[' + value.map(canonicalize).join(',') + ']'
  }
  if (value !== null && typeof value === 'object') {
    const keys = Object.keys(value).sort()  // lexicographic = Unicode code point for ASCII
    return '{' + keys.map(k =>
      JSON.stringify(k) + ':' + canonicalize(value[k])
    ).join(',') + '}'
  }
  return JSON.stringify(value)
}

// Note: Object.keys().sort() uses UTF-16 code unit comparison,
// which matches RFC 8785 for the vast majority of real-world key names.

The minimal implementation above is correct for ASCII keys. For keys containing non-BMP Unicode characters (code points above U+FFFF, stored as surrogate pairs in UTF-16), use the canonicalize package which handles surrogate pair ordering correctly per the RFC. Verify your JSON is valid first using JSON.stringify before passing it to a canonicalizer.

Canonicalize JSON in Python

Python's built-in json.dumps() with sort_keys=True andseparators=(",", ":") covers most RFC 8785 requirements for ASCII-key JSON. For full compliance including correct number serialization and non-ASCII key ordering, use the canonicaljson package (used by the Matrix protocol for event signing) or implement the algorithm directly.

# Option 1: stdlib — correct for ASCII keys, close enough for most use cases
import json

def canonicalize_stdlib(value):
    return json.dumps(
        value,
        sort_keys=True,
        separators=(',', ':'),
        ensure_ascii=False
    )

obj = {"b": 2, "a": 1, "Z": 0}
print(canonicalize_stdlib(obj))
# '{"Z":0,"a":1,"b":2}'  — uppercase before lowercase (correct JCS order)

# Option 2: canonicaljson package — full RFC 8785 compliance
pip install canonicaljson

import canonicaljson

obj = {"b": 2, "a": 1, "Z": 0}
canonical_bytes = canonicaljson.encode_canonical_json(obj)
# Returns bytes, not str — ready for hashing
print(canonical_bytes)   # b'{"Z":0,"a":1,"b":2}'

# Signing with hashlib
import hashlib
digest = hashlib.sha256(canonical_bytes).hexdigest()
print(digest)             # deterministic across all RFC 8785 implementations

The canonicaljson package returns bytes directly (UTF-8 encoded), not a Python string. This is the correct interface for cryptographic use since hash functions operate on bytes. Avoid calling .encode('utf-8') on the output of json.dumps() if ensure_ascii=False is set and your keys contain non-ASCII characters — always measure the output against a known-good implementation.

A note on Python's number handling: Python's json module serializes Python float values using repr(), which in Python 3.1+ produces the shortest round-trip decimal representation. This matches the RFC 8785 requirement for most values. However, float('inf') and float('nan') are not valid JSON; json.dumps raises aValueError by default (with allow_nan=False), which is the correct RFC 8785 behavior.

Verifying JSON signatures with JCS

The canonical use case for JCS is signing and verifying JSON payloads. The pattern is: (1) remove the signature field from the object, (2) canonicalize the remaining object, (3) sign the canonical bytes, (4) attach the signature. On verification, reverse steps 1–3 and compare. This is how JWS detached payloads work and how the W3C Verifiable Credentials Data Integrity spec (JSON-LD + JCS) uses RFC 8785.

import canonicalize from 'canonicalize'
import crypto from 'crypto'

// ── Signing ──────────────────────────────────────────────────────────────
function signPayload(payload, privateKey) {
  // 1. Canonicalize the payload WITHOUT the signature field
  const { signature: _, ...unsigned } = payload
  const canonical = canonicalize(unsigned)

  // 2. Sign the canonical UTF-8 bytes
  const sign = crypto.createSign('SHA256')
  sign.update(canonical, 'utf8')
  const sig = sign.sign(privateKey, 'base64')

  // 3. Return payload with signature attached
  return { ...unsigned, signature: sig }
}

// ── Verification ─────────────────────────────────────────────────────────
function verifyPayload(payload, publicKey) {
  const { signature, ...unsigned } = payload
  const canonical = canonicalize(unsigned)

  const verify = crypto.createVerify('SHA256')
  verify.update(canonical, 'utf8')
  return verify.verify(publicKey, signature, 'base64')
}

// Usage
const { privateKey, publicKey } = crypto.generateKeyPairSync('rsa', { modulusLength: 2048 })
const doc = { sub: 'alice', iat: 1715000000, data: { role: 'admin', level: 3 } }
const signed = signPayload(doc, privateKey)
console.log(verifyPayload(signed, publicKey))  // true

// Tampered — key order changed, but JCS produces the same canonical bytes
const tampered = { ...signed, data: { level: 3, role: 'admin' } }
console.log(verifyPayload(tampered, publicKey))  // still true — JCS is the point

The last example shows why JCS matters: even though the attacker reordered keys indata, verification still passes because JCS produces the same canonical bytes for both orderings. Without JCS, reordering keys would break the signature, forcing the signer and verifier to agree on a specific serialization — which breaks across languages and libraries. For JWT-based signing see how JWT works; for key management see JSON Web Key (JWK).

JCS vs JSON Normalization Form (JNF) and other approaches

Before RFC 8785, several ad-hoc canonical JSON formats existed. Understanding the differences matters when working with older systems or choosing a standard. The 4 most common approaches are compared below:

Approach	Key sorting	Number handling	Status
JCS (RFC 8785)	UTF-16 code unit order	IEEE 754 / ES2019 shortest	IETF Standard (2020)
JSON-LD Canonicalization	Unicode code point (same as JCS for BMP)	Unspecified (pass-through)	W3C, used in Verifiable Credentials
Oldie: Gibson's Canonical JSON	Byte-order (UTF-8)	Integer only (no floats)	Non-standard, pre-RFC 8785
sort_keys + no-whitespace	Python/JS default sort	Language-specific	Ad-hoc, not standardized

JCS vs alphabetical sort_keys: For keys containing only ASCII letters, the results are identical — both produce the same ordering. The divergence only appears for mixed-case keys ("Z" before "a" in JCS vs "A", "a", "Z" in case-insensitive alphabetical) and for non-ASCII characters. For new systems, always use RFC 8785. For interoperating with existing systems that use sort_keys=True, verify that your key set is ASCII-only and that the number serialization matches.

JCS vs JSON-LD canonicalization (JCS + RDFC): W3C Verifiable Credentials use JCS in combination with RDF Dataset Canonicalization (RDFC-1.0) for linked-data contexts. If you are working with JSON-LD documents, you need both layers; for plain JSON objects, RFC 8785 alone is sufficient. See the JSON.stringify tutorial for how JavaScript's built-in serializer relates to the JCS number rules.

Unicode code point key sorting: edge cases and gotchas

Key sorting is the rule most likely to cause subtle bugs. The RFC 8785 sort order is Unicode code point order on the UTF-16 encoding of each key string — the same order used by JavaScript's Array.prototype.sort() when comparing strings character by character. Here are the 5 most common edge cases that trip up implementations:

// Edge case 1: uppercase before lowercase
// "Z" (U+005A = 90) before "a" (U+0061 = 97)
canonicalize({ a: 1, Z: 2 })
// '{"Z":2,"a":1}'   ✓ correct JCS
// '{"a":1,"Z":2}'   ✗ alphabetical sort (wrong)

// Edge case 2: digits before letters
// "1" (U+0031 = 49) before "A" (U+0041 = 65)
canonicalize({ B: 2, '1': 1, a: 3 })
// '{"1":1,"B":2,"a":3}'   ✓

// Edge case 3: empty string key sorts first
canonicalize({ b: 2, '': 0, a: 1 })
// '{"":0,"a":1,"b":2}'   ✓

// Edge case 4: longer key that shares prefix — shorter sorts first
// "ab" (length 2) before "abc" (length 3)
canonicalize({ abc: 3, ab: 2, a: 1 })
// '{"a":1,"ab":2,"abc":3}'   ✓

// Edge case 5: emoji / non-BMP characters use surrogate pairs in UTF-16
// U+1F600 "😀" encodes as surrogate pair U+D83D U+DE00 in UTF-16
// This affects sort order relative to U+E000–U+FFFF private use characters
// Use the canonicalize package rather than a manual sort for these cases

For production systems handling user-generated JSON keys (which may include emoji, Arabic, CJK, or other non-ASCII characters), always use a validated RFC 8785 implementation rather than a hand-rolled sort. The non-BMP surrogate pair ordering is easy to get wrong and the bugs are hard to detect in tests that only use ASCII keys.

For number precision concerns that affect what gets canonicalized, see JSON number precision. For general JSON serialization, see the JSON.stringify tutorial.

Frequently asked questions

What is JSON canonicalization and why is it needed for signatures?

JSON canonicalization is the process of transforming a JSON value into a single, deterministic byte sequence so that cryptographic operations such as hashing and signing produce identical results regardless of which library or language serialized the original value. Without canonicalization, 2 JSON objects that are semantically identical — {"a":1,"b":2} and {"b":2,"a":1} — produce different byte sequences and therefore different SHA-256 hashes. This makes signatures computed over one serialization invalid when verified against the other. RFC 8785 (JSON Canonicalization Scheme, JCS) published in August 2020 standardizes a canonical form: object keys are sorted by Unicode code point order, floating-point numbers use IEEE 754 double-precision serialization, no whitespace is allowed, and control characters in strings are escaped. Any implementation that follows RFC 8785 will produce the exact same bytes for the same logical JSON value, making cross-language and cross-library signature verification reliable. Use JWTs for token-based auth; use JCS when you need to sign arbitrary JSON documents.

What are the key ordering rules in JCS (RFC 8785)?

JCS (RFC 8785) requires object keys to be sorted by their Unicode code point values after UTF-16 encoding. This is NOT the same as alphabetical (locale-aware) sorting. The sort compares the UTF-16 code unit sequences of the key strings character by character. In practice: uppercase ASCII letters (A–Z, U+0041–U+005A) sort before lowercase letters (a–z, U+0061–U+007A), so "Z" sorts before "a". Digits (0–9, U+0030–U+0039) sort before uppercase letters. ASCII symbols sort by their code point values — "$" (U+0024) sorts before "0" (U+0030). For keys containing only ASCII characters, the sort is identical to a standard byte-comparison sort. For keys with non-ASCII characters (including emoji or CJK characters), UTF-16 encoding is used, which differs from UTF-8 code unit comparison. Duplicate keys are invalid in RFC 8785 — an implementation must reject them. Nested object keys are sorted independently at each level. See the JSON.stringify tutorial for how JavaScript's built-in serialization relates to JCS key order.

How do I canonicalize JSON in JavaScript?

The easiest way to canonicalize JSON in JavaScript is with the canonicalize npm package, which is a drop-in replacement for JSON.stringify that follows RFC 8785. Install it with npm install canonicalize. Then: import canonicalize from 'canonicalize'; const canonical = canonicalize(value); The function returns a UTF-8 string with keys sorted by Unicode code point, numbers serialized as IEEE 754, no whitespace, and control characters escaped. For cryptographic signing, compute the hash over the UTF-8 bytes of the canonical string: const hash = crypto.createHash('sha256').update(canonical, 'utf8').digest('hex'). You can also implement key sorting manually, but the canonicalize package handles all edge cases including number serialization and non-BMP Unicode key ordering. Validate your input JSON first using Jsonic's formatter to catch syntax errors before they surface in the canonicalizer.

How do I canonicalize JSON in Python?

In Python, the json module's json.dumps() function accepts a sort_keys=True parameter that sorts object keys. For basic RFC 8785 compliance with ASCII keys: json.dumps(value, sort_keys=True, separators=(",", ":"), ensure_ascii=False). The separators=(",",":") argument removes whitespace. For full RFC 8785 compliance including correct number serialization and non-ASCII key ordering, install the canonicaljson package: pip install canonicaljson; then: canonicaljson.encode_canonical_json(value). The canonicaljson package is used by the Matrix protocol for event signing and handles all edge cases of the RFC 8785 number serialization algorithm. It returns bytes (UTF-8 encoded), not a Python string — ready to pass directly to hashlib.sha256(). Python's float('inf') and float('nan') will raise a ValueError by default, which is the correct RFC 8785 behavior.

What is the difference between JCS and sorting JSON keys alphabetically?

JCS (RFC 8785) sorts object keys by Unicode code point order of their UTF-16 encoded byte sequences, while alphabetical sorting typically uses locale-aware comparison that may treat uppercase and lowercase letters as equivalent or sort them differently depending on locale. The critical difference: in Unicode code point order, all uppercase ASCII letters (A=65 through Z=90) come before all lowercase ASCII letters (a=97 through z=122). So "Zoo" sorts before "apple" in JCS order, but "apple" comes before "Zoo" in case-insensitive alphabetical order. Another difference: JCS sorts numeric strings by their code point values ("10" sorts before "9" because "1"=49< "9"=57), while natural sort would place "9" before "10". For ASCII-only keys, JCS key ordering is equivalent to standard lexicographic byte comparison. Locale-aware sorting — such as JavaScript's Array.prototype.sort() without a comparator in some locales — can produce different results and must not be used for RFC 8785 compliance. For JSON key handling generally, see the JSON.stringify tutorial.

When should I use canonical JSON vs regular JSON?

Use canonical JSON (RFC 8785 / JCS) whenever you need byte-for-byte reproducibility of a JSON value across different systems, libraries, or languages. The primary use cases are: (1) cryptographic signatures — JWS detached payloads, JWT claims sets that need signing, CBOR Object Signing and Encryption (COSE); (2) content-addressed storage — computing a stable hash that serves as a unique identifier for a JSON document (e.g., IPFS, blockchain transaction hashes); (3) deterministic testing — comparing expected vs actual JSON output in tests without being sensitive to key ordering; (4) caching keys — using a JSON object as a cache key requires a stable serialization. Use regular JSON for all other cases: REST API responses, configuration files, data storage, and inter-service communication where you are parsing the JSON rather than comparing its bytes. Canonical JSON has a small performance cost (key sorting is O(n log n) per object) and is not human-readable-friendly since it has no whitespace. For signing tokens specifically, see how JWT works and JSON Web Key (JWK).

Ready to work with canonical JSON?

Use Jsonic's JSON Formatter to validate and prettify your JSON before canonicalizing it. You can also diff two JSON documents to verify they are semantically identical before comparing their canonical forms.

Open JSON Formatter