JSON Analytics Events: Tracking Schema, Segment, and BigQuery
Last updated:
JSON analytics event tracking defines how user actions are captured — a consistent event schema with event, userId, timestamp, properties, and contextfields enables cross-platform analysis. Segment's JSON event format is the de facto standard: {"type":"track","event":"Button Clicked","userId":"user-123","timestamp":"2025-05-19T14:32:00Z","properties":{"button":"signup","page":"/home"}}. Google Analytics 4 uses a JSON payload with client_id, an events array, and event_params key-value pairs — sent via the Measurement Protocol to https://www.google-analytics.com/mp/collect. This guide covers Segment track event JSON schema, GA4 Measurement Protocol JSON, Mixpanel event JSON, BigQuery event table schema, event validation, and analytics data pipeline JSON — so your tracking layer produces clean, queryable data from day one. For the underlying JSON audit logging patterns that complement analytics events with compliance-grade records, see that guide; for JSON event-driven architecture that routes analytics events through a message bus, see that guide.
Need to inspect or validate a raw JSON analytics event? Paste it into Jsonic and format it instantly.
Open JSON FormatterAnalytics Event JSON Schema Design
A well-designed analytics event schema is the foundation of a reliable data stack. Define the schema before the first event is tracked — retrofitting schema changes onto historical data in BigQuery is expensive and error-prone. Five fields are required in every analytics event; additional fields should be standardized in a tracking plan and enforced via JSON data validation at track time.
| Field | Type | Example | Required |
|---|---|---|---|
event | string (Title Case) | "Button Clicked" | Yes |
userId | string | "user-123" | Yes (or anonymousId) |
anonymousId | string (UUID) | "anon-456abc" | Yes (pre-auth) |
timestamp | string (ISO 8601 UTC) | "2025-05-19T14:32:00Z" | Yes |
properties | object | {"button":"signup","page":"/home"} | Yes |
context | object | {"ip":"203.0.113.0","userAgent":"Mozilla/5.0"} | Recommended |
messageId | string (UUID v4) | "550e8400-e29b-41d4-a716-446655440000" | Recommended (dedup) |
type | string enum | "track" | "page" | "identify" | Segment-specific |
A complete minimal analytics event looks like this:
{
"type": "track",
"event": "Button Clicked",
"userId": "user-123",
"anonymousId": "anon-456abc",
"timestamp": "2025-05-19T14:32:00Z",
"messageId": "550e8400-e29b-41d4-a716-446655440000",
"properties": {
"button": "signup",
"page": "/home",
"label": "Get Started"
},
"context": {
"ip": "203.0.113.0",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)",
"locale": "en-US",
"page": {
"url": "https://example.com/home",
"title": "Home — Example",
"referrer": "https://google.com"
}
}
}The context object carries environment metadata that applies to every event from a given client session — IP address, user agent, locale, and page context. Populate it once in a shared tracking wrapper rather than repeating it in every properties object. For JSON Schema patterns to enforce required fields and value formats at track time, see that guide.
Segment Track Event JSON Format
Segment defines four call types — track, page, identify, and group — each with a distinct JSON structure. The track call is the workhorse for behavioral analytics: it records that a user performed a named action with associated properties.
// Segment track call — JavaScript SDK
analytics.track('Purchase Completed', {
orderId: 'ord_789',
revenue: 49.99,
currency: 'USD',
products: [
{ productId: 'prod_001', name: 'Pro Plan', price: 49.99, quantity: 1 }
]
})
// Produces this JSON event sent to Segment's API:
{
"type": "track",
"event": "Purchase Completed",
"userId": "user-123",
"anonymousId": "anon-456abc",
"timestamp": "2025-05-19T14:32:00.000Z",
"messageId": "550e8400-e29b-41d4-a716-446655440000",
"properties": {
"orderId": "ord_789",
"revenue": 49.99,
"currency": "USD",
"products": [
{ "productId": "prod_001", "name": "Pro Plan", "price": 49.99, "quantity": 1 }
]
},
"context": {
"ip": "203.0.113.0",
"userAgent": "Mozilla/5.0",
"locale": "en-US",
"library": { "name": "analytics.js", "version": "4.0.0" }
},
"integrations": {
"All": true,
"Amplitude": false
}
}The integrations object controls which downstream destinations receive the event — set {""All": false"} and then enable specific destinations to route selectively. The identify call links an anonymousId to a userId and sets user traits (email, name, plan) — send it once after login so all subsequent events carry the resolved userId. Segment Protocols validates every incoming event against a registered JSON Schema spec; events that fail validation are quarantined in a violations stream rather than forwarded to destinations. For the broader JSON audit loggingpattern that complements Segment's track events with a tamper-evident record of sensitive actions, see that guide.
GA4 Measurement Protocol JSON
The GA4 Measurement Protocol lets you send events server-side — useful for purchase confirmations, email open tracking, or any event that occurs outside the browser. The endpoint and payload format differ significantly from the older Universal Analytics Measurement Protocol.
// GA4 Measurement Protocol — Node.js
const GA4_ENDPOINT = 'https://www.google-analytics.com/mp/collect'
const MEASUREMENT_ID = 'G-XXXXXXXXXX'
const API_SECRET = process.env.GA4_API_SECRET
async function sendGA4Event(clientId: string, events: GA4Event[]) {
const payload = {
client_id: clientId, // stable browser/device identifier
timestamp_micros: Date.now() * 1000, // optional: event timestamp in microseconds
user_properties: { // optional: user-scoped dimensions
plan: { value: 'pro' },
industry: { value: 'saas' },
},
events, // array of up to 25 event objects
}
const url = `${GA4_ENDPOINT}?measurement_id=${MEASUREMENT_ID}&api_secret=${API_SECRET}`
const response = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
})
// Use the validation server during development:
// https://www.google-analytics.com/debug/mp/collect
return response.status
}
// Event structure — name must be snake_case, params are flat key-value pairs
const purchaseEvent: GA4Event = {
name: 'purchase', // snake_case event name
params: {
transaction_id: 'ord_789',
value: 49.99,
currency: 'USD',
items: [ // reserved e-commerce items array
{
item_id: 'prod_001',
item_name: 'Pro Plan',
price: 49.99,
quantity: 1,
}
],
engagement_time_msec: 100, // required for session stitching
},
}
await sendGA4Event('1234567890.1234567890', [purchaseEvent])GA4 event names must be snake_case and 40 characters or fewer; parameter names must be 40 characters or fewer; parameter values must be strings (500 chars max) or numbers. The client_id must match the value in the _ga cookie for server-side events to be stitched into the correct user session. During development, always validate payloads against the debug endpoint first — the debug endpoint returns a JSON response body detailing any validation errors, while the production endpoint returns HTTP 204 with no body regardless of errors. For JSON data validation patterns to catch GA4 payload errors before they reach the API, see that guide.
Mixpanel Event JSON Format
Mixpanel uses a flat properties object — all fields, including user identification and deduplication keys, live inside propertiesrather than at the top level. This differs from Segment's structure and is a common source of confusion when migrating between platforms.
// Mixpanel HTTP API — send events server-side
const MIXPANEL_ENDPOINT = 'https://api.mixpanel.com/track'
const PROJECT_TOKEN = process.env.MIXPANEL_TOKEN
import crypto from 'crypto'
function createInsertId(event: object): string {
// Deterministic MD5 of the event — prevents duplicates on retry
return crypto
.createHash('md5')
.update(JSON.stringify(event))
.digest('hex')
}
async function sendMixpanelEvent(
distinctId: string,
eventName: string,
properties: Record<string, unknown>,
) {
const event = {
event: eventName,
properties: {
distinct_id: distinctId, // equivalent to userId
token: PROJECT_TOKEN,
time: Math.floor(Date.now() / 1000), // Unix seconds, not milliseconds
...properties, // event-specific properties merged at top level of properties
},
}
// Add $insert_id after constructing the event (so it doesn't self-reference)
event.properties.$insert_id = createInsertId(event)
// Batch endpoint accepts an array of up to 2000 events
const payload = JSON.stringify([event])
const encoded = Buffer.from(payload).toString('base64')
await fetch(`${MIXPANEL_ENDPOINT}?data=${encoded}&ip=1`, {
method: 'GET', // or POST with body=payload and Content-Type: application/json
})
}
await sendMixpanelEvent('user-123', 'Button Clicked', {
button: 'signup',
page: '/home',
// Super properties (plan, industry) are merged by the Mixpanel SDK automatically
// on the client; set them manually here for server-side calls
plan: 'pro',
industry: 'saas',
})Super properties are key-value pairs set once (via mixpanel.register() on the client SDK) and automatically merged into every subsequent event's properties object — plan, cohort, A/B variant. For server-side calls, merge super properties manually into the properties object before sending. Mixpanel deduplicates events with the same $insert_id within a 7-day rolling window — generate the insert ID deterministically (MD5 of event data) so retries produce the same ID. For pipeline ingestion of large event volumes, use the Mixpanel Import API which accepts JSONL files rather than the track endpoint.
BigQuery Analytics Schema
BigQuery is the destination for high-volume analytics events. Schema design choices — STRUCT vs. JSON columns, partitioning strategy, clustering keys — determine whether queries cost dollars or cents at scale.
-- BigQuery analytics events table with STRUCT properties
-- STRUCT is 5-10x faster to query than a JSON STRING column
CREATE TABLE `project.analytics.events` (
event_date DATE NOT NULL, -- partition key
event_timestamp TIMESTAMP NOT NULL,
event_name STRING NOT NULL, -- cluster key 1
user_id STRING, -- cluster key 2
anonymous_id STRING NOT NULL,
message_id STRING NOT NULL, -- deduplication key
insert_id STRING, -- Mixpanel $insert_id
session_id STRING,
-- Typed properties struct — add columns per event type
properties STRUCT<
button STRING,
page STRING,
revenue FLOAT64,
currency STRING,
order_id STRING,
product_id STRING,
quantity INT64
>,
context STRUCT<
ip STRING,
user_agent STRING,
locale STRING,
page_url STRING,
page_title STRING,
referrer STRING
>
)
PARTITION BY event_date
CLUSTER BY event_name, user_id
OPTIONS (
partition_expiration_days = 730 -- 2-year retention
);-- Query: daily active users per event type (uses partition + cluster)
SELECT
event_date,
event_name,
COUNT(DISTINCT user_id) AS dau
FROM `project.analytics.events`
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
AND event_name IN ('Button Clicked', 'Purchase Completed', 'Page Viewed')
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC;
-- Load JSONL batch file into BigQuery
-- bq load --source_format=NEWLINE_DELIMITED_JSON \
-- project:analytics.events_staging \
-- gs://your-bucket/events/2025-05-19/*.jsonl \
-- schema.jsonPartition by event_date (not event_timestamp) to take advantage of partition pruning in date-range queries — BigQuery scans only the partitions touched by the WHERE event_date BETWEEN clause. Cluster by event_name first and user_id second: most analytics queries filter by event type before user. For variable properties that differ significantly across event types, use a JSON STRING column alongside the STRUCT — but query the STRUCT columns in aggregations and use JSON_VALUE() only for exploratory analysis. For JSON data validation before loading into BigQuery, validate against a JSON Schema in your pipeline to prevent bad rows from corrupting the table.
Event Validation and Linting
Bad analytics events are far cheaper to catch at tracking time than to clean up in the warehouse. A validation layer between your application code and the analytics destination blocks malformed events before they pollute historical data.
// JSON Schema for a "Button Clicked" event
const buttonClickedSchema = {
type: 'object',
required: ['event', 'userId', 'timestamp', 'properties'],
additionalProperties: true,
properties: {
event: {
type: 'string',
const: 'Button Clicked',
},
userId: {
type: 'string',
minLength: 1,
},
anonymousId: {
type: 'string',
},
timestamp: {
type: 'string',
format: 'date-time', // ISO 8601 — Ajv's ajv-formats plugin
},
properties: {
type: 'object',
required: ['button', 'page'],
properties: {
button: { type: 'string', minLength: 1 },
page: { type: 'string', pattern: '^\/' },
label: { type: 'string' },
},
additionalProperties: true,
},
},
}
// Validate with Ajv
import Ajv from 'ajv'
import addFormats from 'ajv-formats'
const ajv = new Ajv({ allErrors: true })
addFormats(ajv)
const validate = ajv.compile(buttonClickedSchema)
function trackEvent(eventName: string, properties: object) {
const event = buildEvent(eventName, properties)
if (!validate(event)) {
// Log validation errors to your observability platform
// but do NOT throw — never break the user experience for analytics
console.error('Analytics event validation failed', {
event: eventName,
errors: validate.errors,
})
return // drop invalid event rather than forward bad data
}
analytics.track(event)
}Segment Protocols provides a managed validation layer: register your event schemas in the Segment app and events not matching the spec are quarantined in a violations stream. Avo.app generates type-safe tracking functions from a schema definition — compile-time errors for missing required properties replace runtime validation checks. Run validation in CI by maintaining a fixture set of known-good and known-bad events and asserting that the validator accepts/rejects them correctly. For the full JSON Schema patterns guide covering advanced keywords like oneOf, discriminator, and $ref, see that guide.
Analytics Data Pipeline JSON
A production analytics pipeline ingests events at high volume, transforms them into a consistent schema, deduplicates, and loads into the warehouse. JSONL (newline- delimited JSON) is the standard wire format for batch pipeline stages.
# JSONL format — one JSON object per line, no commas between objects
# Each line is a complete, valid JSON event
{"event":"Button Clicked","userId":"user-123","timestamp":"2025-05-19T14:32:00Z","properties":{"button":"signup","page":"/home"}}
{"event":"Page Viewed","userId":"user-124","timestamp":"2025-05-19T14:32:01Z","properties":{"page":"/pricing","title":"Pricing"}}
{"event":"Purchase Completed","userId":"user-123","timestamp":"2025-05-19T14:32:05Z","properties":{"orderId":"ord_789","revenue":49.99}}// Pipeline: read JSONL, transform, deduplicate, load to BigQuery
import { createReadStream } from 'fs'
import { createInterface } from 'readline'
import { BigQuery } from '@google-cloud/bigquery'
const bq = new BigQuery()
const dataset = bq.dataset('analytics')
const table = dataset.table('events')
const seenMessageIds = new Set<string>()
const rows: object[] = []
async function processJSONLFile(filePath: string) {
const rl = createInterface({
input: createReadStream(filePath),
crlfDelay: Infinity,
})
for await (const line of rl) {
if (!line.trim()) continue
let event: Record<string, unknown>
try {
event = JSON.parse(line)
} catch {
console.error('Invalid JSON line, skipping:', line.slice(0, 100))
continue
}
// Deduplication: skip events with a seen messageId
const messageId = event.messageId as string | undefined
if (messageId) {
if (seenMessageIds.has(messageId)) continue
seenMessageIds.add(messageId)
}
// Transform: normalize timestamp to BigQuery TIMESTAMP format
const transformed = {
event_date: (event.timestamp as string).slice(0, 10),
event_timestamp: event.timestamp,
event_name: event.event,
user_id: event.userId ?? null,
anonymous_id: event.anonymousId ?? 'unknown',
message_id: messageId ?? crypto.randomUUID(),
properties: event.properties,
context: event.context ?? {},
}
rows.push(transformed)
// Batch insert every 500 rows
if (rows.length >= 500) {
await table.insert(rows.splice(0, 500))
}
}
// Flush remaining rows
if (rows.length > 0) {
await table.insert(rows.splice(0))
}
}
await processJSONLFile('./events-2025-05-19.jsonl')For real-time pipelines, use a message queue (Kafka, Pub/Sub, Kinesis) between the tracking SDK and the warehouse: events are published as JSON messages, a consumer validates and transforms them, and a batch loader flushes to BigQuery every minute or every 10,000 events. Dead-letter queues capture events that fail validation or transformation so they can be replayed after schema fixes. For the JSON event-driven architecture that provides the message bus layer, see that guide. For deduplication at the BigQuery layer, use a MERGE statement on the message_id column when loading from a staging table into the production events table.
Definitions
- Track event
- A JSON analytics event that records a specific user action — the core call type in Segment and similar CDPs. Every track event has at minimum an event name, a user or anonymous identifier, a timestamp, and a properties object containing event-specific dimensions and metrics.
- Anonymous ID
- A stable UUID generated client-side (stored in a cookie or local storage) that identifies a device or browser session before the user authenticates. Used as the primary identifier for pre-login events; linked to a
userIdvia anidentifycall after login so the full user journey can be stitched together in the warehouse. - User ID
- A stable, immutable identifier for an authenticated user — typically the primary key from your users table. Never use email addresses or usernames as the
userId; they change over the user's lifetime. TheuserIdis undefined before authentication and set via theidentifycall after login. - Measurement Protocol
- Google Analytics 4's server-to-server API for sending events that occur outside the browser: purchase confirmations, server-side conversion attribution, email open tracking. Events are POSTed as JSON to
https://www.google-analytics.com/mp/collectwith ameasurement_idandapi_secretin the query string. The payload must include aclient_idmatching the browser's GA4 cookie value for server-side events to be attributed to the correct session. - Insert ID
- A deduplication key used by Mixpanel (
$insert_id) and BigQuery (streaming insertinsertId). Generate it as a deterministic MD5 hash of the event data — the same event retried multiple times produces the same insert ID, and the analytics platform deduplicates within a rolling window (7 days for Mixpanel, ~1 minute for BigQuery streaming). - Super properties
- Key-value pairs registered once (via
mixpanel.register()or equivalent) that are automatically merged into every subsequent event's properties — plan tier, A/B variant, cohort, industry. Super properties eliminate the need to pass the same contextual dimensions in every individualtrack()call. - Event schema
- A JSON Schema definition that specifies the required and optional fields, their types, formats, and allowed values for a given event type. Registered in Segment Protocols, Avo.app, or a custom schema registry; enforced at track time to block malformed events before they reach the analytics destination or data warehouse.
Frequently asked questions
What JSON format should I use for analytics events?
The Segment track event JSON format is the closest thing to an industry standard: {"type":"track","event":"Button Clicked","userId":"user-123","anonymousId":"anon-456","timestamp":"2025-05-19T14:32:00Z","properties":{"button":"signup","page":"/home"},"context":{"ip":"203.0.113.0","userAgent":"Mozilla/5.0"}}. Every event needs at minimum an event name, a user or anonymous identifier, a timestamp (ISO 8601 UTC), and a properties object for event-specific data. Using this schema ensures compatibility with CDP platforms, data warehouses, and analytics pipelines without custom transformation. For JSON Schema patterns to enforce the format, see that guide.
What is the Segment track event JSON schema?
A Segment track event has six top-level fields: type ("track"), event (past-tense verb + noun, e.g. "Button Clicked"), userId (authenticated user identifier), anonymousId (pre-authentication cookie/device ID), timestamp (ISO 8601 UTC), and properties (event-specific key-value pairs). Optional top-level fields include context (environment metadata: ip, userAgent, locale, page), integrations (per-destination enable/disable flags), and messageId (UUID for deduplication). The event name must use past-tense verb + noun convention and be consistent across all calls.
How do I use the GA4 Measurement Protocol with JSON?
Send a POST request to https://www.google-analytics.com/mp/collect?measurement_id=G-XXXX&api_secret=YOUR_SECRET with Content-Type: application/json. The JSON body has two top-level fields: client_id (a stable string identifying the browser/device, matching the GA4 cookie value) and events (an array of up to 25 event objects). Each event object has name (snake_case, e.g. "button_click") and params (flat key-value object — string, number, or boolean values only; no nested objects). Use the Measurement Protocol Validation Server during development to catch payload errors before sending to production.
What is the Mixpanel JSON event format?
Mixpanel events use a flat properties object rather than nested top-level fields. Required fields inside properties: distinct_id (equivalent to userId), $insert_id (deduplication key — MD5 hash of event data), time (Unix timestamp in seconds, not milliseconds), and token (your Mixpanel project token). Event-specific dimensions are merged at the same level as these system fields. Super properties set via mixpanel.register() are merged into every event properties object automatically on the client SDK.
How do I store analytics JSON events in BigQuery?
Use STRUCTcolumns for typed properties — STRUCT is 5-10× faster to query than a JSON STRING column because BigQuery can skip irrelevant columns during columnar reads. Partition the table by event_date (DATE) to reduce query costs with date-range filters. Cluster by event_name first and user_id second — most analytics queries filter by event type before user. Load events via streaming inserts for real-time or JSONL batch files via bq load for pipeline ingestion. Add a message_id column and use MERGE for deduplication when loading from a staging table.
How do I validate analytics JSON events?
Define a JSON Schema per event type and validate at tracking time using Ajv (JavaScript) or jsonschema (Python). Never throw an exception on validation failure — drop the invalid event and log the error to your observability platform instead of breaking the user experience. For managed validation, use Segment Protocols to register event schemas; non-conforming events are quarantined in a violations stream. Run validation in CI with a fixture set of known-good and known-bad events. For the full JSON data validation guide, see that guide.
What is the best analytics event naming convention?
Use past-tense verb + noun in Title Case: "Button Clicked", "Page Viewed", "Form Submitted", "Purchase Completed". Avoid present-tense, snake_case, camelCase, and abbreviations. Enforce the convention via JSON Schema regex: {""pattern":"^[A-Z][a-zA-Z]+ [A-Z][a-zA-Z]+""}rejects events that don't start with a capitalized word followed by a space and another capitalized word. Document the full event taxonomy in a tracking plan before implementation — Avo.app generates type-safe tracking functions from the schema definition, replacing runtime validation with compile-time errors.
How do I prevent duplicate analytics JSON events?
Three defenses: (1) Client-side: generate a messageId (UUID v4) per event and check localStorage before sending — if the ID exists, skip the send. (2) Platform deduplication: in Mixpanel, set $insert_id to a deterministic MD5 hash of event data (deduplicated within 7 days); in Segment, set messageId at track time (deduplicated within 24 hours). (3) BigQuery: use a MERGE statement on message_id when loading from a staging table, or rely on BigQuery streaming insert best-effort deduplication within a 1-minute window via insertId. For JSON audit logging where every event must be captured without any deduplication, see that guide.
Further reading and primary sources
- Segment Spec — Track — Canonical reference for the Segment track event JSON schema including all top-level fields and the context object
- GA4 Measurement Protocol — GA4 server-side event ingestion: endpoint, payload structure, event_params, and the validation server
- Mixpanel HTTP API — Mixpanel track endpoint, JSON event format, $insert_id deduplication, and batch ingestion via the import API
- BigQuery Partitioned Tables — Partition and cluster analytics event tables to reduce query costs and improve performance
- Avo.app — Tracking Plan — Generate type-safe analytics tracking functions from a schema definition — enforces event naming and required properties at compile time
Ready to validate your JSON analytics events?
Paste any JSON analytics event — Segment, GA4, or Mixpanel — into Jsonic to format, validate, and explore the structure before shipping to your analytics destination.
Open JSON Formatter