JSON to Protobuf: Convert and Encode with protobufjs and Python
Last updated:
Converting JSON to Protobuf means defining a .proto schema, compiling it (or loading it dynamically), then using the generated code or a runtime library to encode your JSON data as a binary Protocol Buffers message. Protobuf binary encoding is 3–10× smaller than equivalent JSON and 5–10× faster to serialize — field names are replaced by 1-byte field-number tags and integers use compact varint encoding. The trade-off: you need a .proto schema and either generated code or a dynamic loader — Protobuf binary is not self-describing like JSON.
This guide covers writing .proto files from a JSON example, converting JSON with protobufjs in Node.js, encoding and decoding with betterproto and google-protobuf in Python, mapping JSON types to Protobuf field types, handling nested objects and arrays, and understanding the canonical Protobuf JSON format for round-trip testing.
JSON Types to Protobuf Field Type Mapping
Every JSON value type has one or more Protobuf counterparts. Choosing the right field type matters: picking int32 for a value that exceeds 2,147,483,647 will silently overflow; using float instead of double loses decimal precision. Use the table below as a decision guide when writing your first .proto from an existing JSON payload.
| JSON type | Protobuf field type(s) | Notes |
|---|---|---|
string | string | UTF-8 encoded; use bytes for raw binary data |
number (integer) | int32, int64, uint32, uint64 | Use int64 for JS Number-safe integers; int64 is encoded as string in proto JSON |
number (decimal) | float, double | Prefer double (64-bit) to match JSON precision; float (32-bit) is lossy |
boolean | bool | Encodes as a single varint byte (0 or 1) |
array | repeated <type> | e.g. repeated string tags; packed encoding for scalars by default |
object | nested message | Define a separate message type; reference it as a field |
null | oneof or wrapper types | proto3 fields have defaults (0, "", false); use google.protobuf.StringValue for nullable strings |
object (string map) | map<string, ValueType> | For dynamic string-keyed objects; key must be an integral or string type |
When you encounter a JSON number that could be either integer or decimal (e.g. a price field that sometimes reads 10 and sometimes 10.99), always use double — Protobuf will encode whole-number doubles efficiently and you avoid a schema change later.
Write a .proto File from a JSON Example
Given a JSON payload, writing the corresponding .proto file is a mechanical process: each key becomes a named field with an assigned field number, and the value type determines the Protobuf scalar type. Always start with syntax = "proto3"; — proto3 is the current version and drops required fields, simplifying the type system.
Example JSON payload:
{
"orderId": "ord_001",
"customerId": 42,
"total": 129.99,
"paid": true,
"tags": ["express", "gift"],
"shippingAddress": {
"street": "123 Main St",
"city": "Springfield",
"country": "US"
}
}Equivalent .proto file:
syntax = "proto3";
package orders;
message Address {
string street = 1;
string city = 2;
string country = 3;
}
message Order {
string order_id = 1;
int32 customer_id = 2;
double total = 3;
bool paid = 4;
repeated string tags = 5;
Address shipping_address = 6;
}Field numbers must be unique within a message and never reused once data has been serialized with them. Numbers 1–15 use 1 byte for the tag; 16–2047 use 2 bytes — so assign 1–15 to your most frequently occurring fields. The package declaration namespaces the generated code and prevents message name collisions across .proto files.
Encode JSON to Protobuf Binary in Node.js with protobufjs
protobufjs is the most widely used JavaScript Protobuf library. It supports both dynamic loading (no build step) and static code generation. The dynamic approach shown below is the fastest way to get started — you load the .proto file at runtime and encode directly from a plain JavaScript object.
npm install protobufjs// encode.mjs
import protobuf from 'protobufjs';
async function encodeOrder() {
// Load .proto schema at runtime (no protoc required)
const root = await protobuf.load('order.proto');
// Look up the Order message type
const Order = root.lookupType('orders.Order');
const payload = {
orderId: 'ord_001',
customerId: 42,
total: 129.99,
paid: true,
tags: ['express', 'gift'],
shippingAddress: {
street: '123 Main St',
city: 'Springfield',
country: 'US',
},
};
// Verify the payload matches the schema
const errMsg = Order.verify(payload);
if (errMsg) throw new Error(errMsg);
// Create a protobuf message object and encode to binary
const message = Order.create(payload);
const buffer = Order.encode(message).finish(); // Uint8Array
console.log('JSON size :', JSON.stringify(payload).length, 'bytes');
console.log('Proto size:', buffer.length, 'bytes');
return buffer;
}
encodeOrder();Order.verify() catches type mismatches before encoding — for example, if customerId is a string instead of a number. Order.create() converts the plain object into a protobufjs Message instance. .finish() flushes the internal writer and returns the binary buffer as a Uint8Array. For the example above, the Protobuf binary is typically 55–70 bytes versus ~145 bytes for the JSON string.
For production, generate static TypeScript types with: npx pbjs -t static-module -w commonjs -o order.js order.proto and npx pbts -o order.d.ts order.js. The generated module replaces the dynamic load and provides full TypeScript autocompletion on message fields.
Decode Protobuf Binary Back to JSON in Node.js
Decoding reverses the encoding process: you pass the binary buffer to Message.decode() to get a protobufjs message object, then call Message.toObject() or .toJSON() to convert it to a plain JavaScript object. The toObject options control how special types like int64 and bytes are represented.
// decode.mjs
import protobuf from 'protobufjs';
async function decodeOrder(buffer) {
const root = await protobuf.load('order.proto');
const Order = root.lookupType('orders.Order');
// Decode binary buffer to a protobufjs Message object
const decoded = Order.decode(buffer);
// Convert to a plain JS object
const obj = Order.toObject(decoded, {
longs: String, // int64 fields -> string (avoids JS precision loss)
enums: String, // enum fields -> string name
bytes: String, // bytes fields -> base64 string
defaults: true, // include fields with default values
arrays: true, // always return arrays for repeated fields
objects: true, // always return objects for map fields
oneofs: true, // include the virtual oneof name field
});
console.log(obj);
// {
// orderId: 'ord_001',
// customerId: 42,
// total: 129.99,
// paid: true,
// tags: [ 'express', 'gift' ],
// shippingAddress: { street: '123 Main St', city: 'Springfield', country: 'US' }
// }
return obj;
}
// Shorthand: decoded.toJSON() also returns the proto3 canonical JSON format
// const json = decoded.toJSON();The longs: String option is important when your schema uses int64 or uint64 — JavaScript numbers cannot represent integers larger than 2^53 precisely, so protobufjs represents them as Long objects by default. Converting to string avoids silent precision loss when passing large IDs through JSON. For a round-trip test, encode then immediately decode and use JSON.stringify(original) === JSON.stringify(roundTripped) to verify field values are preserved.
Encode and Decode JSON↔Protobuf in Python
Python has two main options for Protobuf: the official google-protobuf package (requires running protoc to generate _pb2.py files) and betterproto (generates Python dataclasses, no protoc binary needed). Both are shown below.
Option A: google-protobuf with grpcio-tools (official):
pip install grpcio-tools
# Generate order_pb2.py from order.proto
python -m grpc_tools.protoc -I. --python_out=. order.proto# encode_decode.py
from order_pb2 import Order, Address
from google.protobuf.json_format import MessageToJson, MessageToDict, Parse
# Build message from field values (mirrors JSON object)
order = Order(
order_id='ord_001',
customer_id=42,
total=129.99,
paid=True,
tags=['express', 'gift'],
shipping_address=Address(
street='123 Main St',
city='Springfield',
country='US',
),
)
# Encode to binary
binary = order.SerializeToString()
print('Proto size:', len(binary), 'bytes')
# Decode from binary
decoded = Order()
decoded.ParseFromString(binary)
print(decoded.order_id) # 'ord_001'
# Convert to JSON string (proto3 canonical format)
json_str = MessageToJson(decoded)
# Convert to Python dict
obj = MessageToDict(decoded, including_default_value_fields=True)
# Parse from a JSON string back to a proto message
from_json = Parse(json_str, Order())Option B: betterproto (no protoc binary required):
pip install betterproto grpclib
# betterproto generates dataclasses from .proto via its own compiler plugin
python -m grpc_tools.protoc -I. --python_betterproto_out=. order.proto# encode_decode_betterproto.py
import dataclasses
from order import Order, Address # betterproto-generated dataclasses
order = Order(
order_id='ord_001',
customer_id=42,
total=129.99,
paid=True,
tags=['express', 'gift'],
shipping_address=Address(street='123 Main St', city='Springfield', country='US'),
)
# Encode to binary bytes
binary: bytes = bytes(order)
print('Proto size:', len(binary), 'bytes')
# Decode from binary
decoded = Order().parse(binary)
print(decoded.order_id) # 'ord_001'
# Convert to dict (for JSON serialization)
import json
obj = dataclasses.asdict(decoded)
json_str = json.dumps(obj)betterproto generates idiomatic Python dataclasses, which integrate naturally with type checkers like mypy and editors. google-protobuf generates classes with a more Java-like API but is the official implementation and tracks the proto spec most closely. For parsing JSON in Python before encoding, use json.loads() and then unpack the resulting dict into the message constructor.
Handle Nested JSON Objects and Arrays in Protobuf
JSON structures with nested objects and arrays require nested message types and repeated fields respectively. Protobuf does not have an anonymous object type — every nested object must be a named message. This is more verbose but enforces explicit contracts for every sub-structure.
Nested objects: define a message for each level, then reference it as a field type:
// Handles: { "user": { "profile": { "bio": "...", "avatarUrl": "..." } } }
message Profile {
string bio = 1;
string avatar_url = 2;
}
message User {
int32 id = 1;
string name = 2;
Profile profile = 3; // nested message
}
message Response {
User user = 1; // another level of nesting
}Arrays of primitives: use repeated scalar fields:
// Handles: { "tags": ["fast", "reliable"], "scores": [98, 87, 92] }
message Report {
repeated string tags = 1; // array of strings
repeated int32 scores = 2; // array of integers (packed by default)
}Arrays of objects: declare a message for the element type, then use it with repeated:
// Handles: { "items": [{ "sku": "A1", "qty": 2 }, { "sku": "B3", "qty": 1 }] }
message LineItem {
string sku = 1;
int32 qty = 2;
}
message Cart {
repeated LineItem items = 1;
}String-keyed maps: use the map built-in type:
// Handles: { "metadata": { "source": "web", "campaign": "spring25" } }
message Event {
string event_type = 1;
map<string, string> metadata = 2; // dynamic string-keyed object
}map fields cannot be repeated and cannot use bytes or float/double as keys. For string-to-message maps: map<string, Profile>. In protobufjs, map fields are represented as plain JavaScript objects; in Python they behave like dicts.
Protobuf JSON Format: the Canonical JSON Mapping
The proto3 specification defines a canonical JSON encoding that allows Protobuf messages to be represented as standard JSON text. This is distinct from binary Protobuf: it is human-readable, can be sent over REST APIs, and round-trips correctly through Protobuf type constraints. Most libraries implement it via a dedicated method.
Key proto3 JSON encoding rules:
- Field names: snake_case
.protofield names are converted to lowerCamelCase in JSON (order_idbecomes"orderId") - int64/uint64: encoded as JSON strings (
"1234567890123") to avoid JavaScript precision loss above 2^53 - bytes: base64-encoded strings
- Enums: use the string name (
"STATUS_ACTIVE"), not the integer value - Default values: fields at their default (0, "", false, []) are omitted by default; use
emitDefaultValues/including_default_value_fieldsto include them - Timestamps:
google.protobuf.Timestampencodes as an RFC 3339 string ("2025-05-19T00:00:00Z")
// protobufjs: produce canonical proto3 JSON
const decoded = Order.decode(buffer);
// .toJSON() follows proto3 JSON rules (camelCase, int64 as string, etc.)
const protoJson = decoded.toJSON();
// { orderId: 'ord_001', customerId: 42, total: 129.99, paid: true, ... }
// Round-trip: parse proto JSON back to binary
const repacked = Order.encode(Order.fromObject(protoJson)).finish();# Python google-protobuf: canonical JSON
from google.protobuf.json_format import MessageToJson, Parse
json_str = MessageToJson(order, including_default_value_fields=False)
# '{"orderId": "ord_001", "customerId": 42, "total": 129.99, "paid": true, ...}'
# Parse JSON string back to proto message
restored = Parse(json_str, Order())Use Protobuf JSON format for logging (structured, human-readable), REST compatibility layers (accept Protobuf JSON from clients that cannot handle binary), and debugging. Use binary Protobuf for network transport between services where size and speed matter. The two formats are semantically equivalent and interconvertible with no data loss for well-formed messages. For more on structured logging with JSON, see JSON structured logging.
Definitions
- Protocol Buffers (Protobuf)
- A language-neutral, platform-neutral binary serialization format developed by Google. Data is encoded as compact binary using field numbers and varint encoding, making payloads 3–10× smaller and 5–10× faster to serialize than equivalent JSON.
- .proto file
- The schema definition file for Protocol Buffers. It defines message types, field names, field numbers, and scalar types using the
proto3(or proto2) syntax. The.protofile is required to encode or decode any Protobuf binary — unlike JSON, Protobuf binary is not self-describing. - Message
- The basic structural unit in a
.protoschema, analogous to a JSON object or a TypeScript interface. Amessageblock declares an ordered set of typed, numbered fields. Messages can be nested and referenced by other messages. - Field number
- A unique integer assigned to each field within a message (e.g.
= 1,= 2). Field numbers appear in the binary wire format instead of field names, which is the primary reason Protobuf binary is smaller than JSON. Field numbers must never be reused once binary data has been serialized using them — changing the number of an existing field is a breaking change. - Varint encoding
- A variable-length integer encoding used by Protobuf for
int32,int64,uint32,uint64,bool, andenumfields. Small values (0–127) encode in 1 byte; larger values use additional bytes. This makes small integers very compact — the value 42 encodes in 1 byte, compared to 2 ASCII characters in JSON. - repeated field
- A Protobuf field modifier that allows zero or more values of the declared type, corresponding to a JSON array. Scalar
repeatedfields use packed encoding by default in proto3, which stores all elements contiguously after a single length prefix — more efficient than tagging each element individually. - proto3 JSON format
- A standardized JSON encoding for Protobuf messages defined in the proto3 specification. Uses lowerCamelCase field names, encodes
int64as strings,bytesas base64, and enum values as their string names. Semantically equivalent to binary Protobuf; suitable for REST APIs, logging, and debugging.
Frequently Asked Questions
How do I convert a JSON object to a Protobuf binary message in JavaScript?
Install protobufjs with npm install protobufjs. Load your .proto file with protobuf.load("your.proto"), look up the message type with root.lookupType("YourMessage"), then call YourMessage.encode(YourMessage.create(jsonObject)).finish() to produce a Uint8Array binary buffer. The full pattern: const root = await protobuf.load("order.proto"); const Order = root.lookupType("orders.Order"); const buffer = Order.encode(Order.create({ id: 1, total: 29.99, paid: true })).finish(). You can also use static code generation: run npx pbjs -t static-module -w commonjs -o order.js order.proto then import the generated module. Dynamic loading requires no build step and is useful during development; static generation produces smaller bundles for production.
What is the difference between JSON and Protobuf encoding?
JSON encoding produces UTF-8 text where every field name and value is a string, including all delimiters (braces, colons, commas). A field like "userId": 42 takes 14 bytes as JSON text. Protobuf binary encoding uses field numbers instead of names, varint encoding for integers (42 encodes as 2 bytes: a 1-byte tag and 1-byte value), and length-delimited encoding for strings. The result: Protobuf binary payloads are typically 3–10× smaller than equivalent JSON and 5–10× faster to serialize. The key trade-off is that JSON is self-describing (readable without a schema), while Protobuf is not — you must have the .proto file and generated or loaded type information to decode the binary.
How do I map JSON types to Protobuf field types?
JSON has six types and Protobuf mappings: string maps to proto string (or bytes for raw binary). JSON numbers map to int32, int64, uint32, uint64, float, or double — choose based on range and precision needs. boolean maps to bool. null is not directly representable in proto3 — use google.protobuf.StringValue or a oneof wrapper for nullable fields. JSON arrays map to repeated fields. JSON objects map to nested message types. When in doubt for numeric values, use int64 for whole numbers and double for decimals to match JavaScript's Number type most closely and avoid precision loss.
Can I use Protobuf without code generation?
Yes. protobufjs in Node.js supports fully dynamic usage: call protobuf.load("schema.proto") at runtime to load the schema, then use root.lookupType("MessageName") to get a type object. No compilation step is required. You encode and decode using the dynamic type object methods directly. This is convenient for development and for tools that need to handle arbitrary schemas at runtime. The trade-off vs static code generation: no TypeScript type safety on message objects, slightly slower first load (schema is parsed at runtime), and a larger bundle size. For production services with a fixed schema, static generation is recommended. For scripting, CLI tools, or schema-agnostic middleware, dynamic loading is perfectly suitable.
How do I handle JSON arrays in a Protobuf schema?
JSON arrays map to repeated fields in Protobuf. A repeated field can appear zero or more times. For an array of strings like ["tag1", "tag2"], declare repeated string tags = 3;. Pass the JavaScript array directly when encoding — protobufjs handles serializing each element. For arrays of objects, create a nested message type and use repeated MyMessage items = 4;. For arrays of arrays (2D), Protobuf does not support repeated repeated natively — wrap the inner array in a message: message Row { repeated double values = 1; } then repeated Row rows = 1;. In proto3, repeated fields of numeric types use packed encoding by default, storing all values contiguously after a single length prefix.
How do I decode a Protobuf binary message back to JSON?
In protobufjs: const decoded = Message.decode(buffer) then const obj = Message.toObject(decoded, { longs: String, enums: String, bytes: String }). The longs: String option converts int64 values (represented as Long objects by default) to strings, preventing silent precision loss. Calling decoded.toJSON() is equivalent. In Python with google.protobuf: from google.protobuf.json_format import MessageToJson; json_str = MessageToJson(proto_message). This produces the proto3 JSON format with camelCase field names. With betterproto, dataclasses.asdict(message) returns a Python dict. The binary buffer alone is not self-describing — you must have the message type to perform decoding.
What is Protobuf JSON format vs binary Protobuf?
Protobuf JSON format is a standardized text encoding defined in the proto3 spec. It looks like JSON but follows specific rules: field names are lowerCamelCase, int64 values are strings (to avoid JS precision loss), bytes fields are base64-encoded, and enum values use their string name rather than integer. Example: { "userId": "1234567890123", "createdAt": "2025-05-19T00:00:00Z" }. Binary Protobuf is the wire format: compact, not human-readable, field-number-keyed. Use binary for network transport (smaller, faster). Use Protobuf JSON format for debugging, logging, REST API compatibility layers, or when you need a text encoding that round-trips cleanly through Protobuf type constraints. In protobufjs, .toJSON() produces the canonical Protobuf JSON format.
Why is Protobuf smaller than JSON?
Protobuf achieves its size advantage through three mechanisms. First, field names are stripped: instead of the string "userId" (6 bytes plus quotes and colon), the binary stores a 1-byte field tag derived from the field number. Second, integers use varint encoding — small integers like 0–127 encode in 1 byte; values up to 16,383 take 2 bytes. JSON always encodes integers as ASCII digit sequences. Third, repeated scalar fields use packed encoding: a single length-prefix followed by all values concatenated, rather than a tag per element. For a realistic message with 10 fields and a mix of strings, integers, and booleans, JSON might use 200 bytes while Protobuf uses 50–80 bytes — a 2.5–4× reduction. With many small integers, the ratio can exceed 10×. For a full comparison, see JSON vs Protobuf.
Validate your JSON before writing a .proto schema
Before defining your Protobuf schema, use Jsonic's JSON Formatter to validate and inspect your JSON payload structure — spot type inconsistencies and null values that need special handling in proto3.
Open JSON FormatterFurther reading and primary sources
- Protocol Buffers Language Guide (proto3) — Official proto3 syntax reference: scalar types, field numbers, reserved fields, and JSON mapping rules
- protobufjs on npm — JavaScript/TypeScript Protobuf library supporting dynamic .proto loading and static code generation via pbjs/pbts
- betterproto on PyPI — Modern Python Protobuf library generating dataclasses instead of the traditional google-protobuf classes
- proto3 JSON Mapping (protobuf.dev) — Canonical rules for encoding proto3 messages as JSON: field name conventions, int64 as string, bytes as base64
- JSON vs Protobuf (Jsonic) — Full comparison of JSON and Protobuf: size benchmarks, parse speed, schema evolution, and when to use each