JSON Observability: Structured Logs, Distributed Traces & Metrics

Last updated:

Overview

Modern observability is built on three pillars: structured JSON logs (queryable events with standard fields), distributed traces (JSON spans linked by trace ID across services), and metrics (numeric aggregates with trace exemplars). This guide covers the full integration: designing structured log schema for JSON APIs, propagating trace context with W3C traceparent headers, exporting OpenTelemetry OTLP JSON, correlating logs to traces in Grafana and Datadog, comparing OTLP JSON vs Prometheus metrics, and implementing end-to-end observability in Node.js.

Key Patterns

  • Structured Logging: Every log line is JSON with mandatory fields (timestamp, level, service, traceId, spanId) and optional context fields
  • W3C Traceparent: HTTP header format for propagating trace context: [var:version]-[var:traceId]-[var:parentId]-[var:flags]
  • OTLP JSON Export: OpenTelemetry Protocol with hierarchical structure: resourceSpans → scopeSpans → spans/metrics/logs
  • AsyncLocalStorage: Node.js mechanism for propagating trace context across async boundaries without passing through function parameters
  • Log-to-Trace Correlation: Inject traceId into every log line to enable jumping from logs to distributed trace in one click
  • Metrics Exemplars: Sampled metric values with traceId/spanId that link slow metric buckets to the exact trace that produced them

Best Practices

  1. Define mandatory log fields (timestamp ISO format, level, service name, environment, traceId, spanId, requestId) in a shared logger package
  2. Use W3C traceparent header for cross-service trace propagation — supported natively by OpenTelemetry and all major observability platforms
  3. Inject trace context via middleware using OpenTelemetry SDK, store in AsyncLocalStorage, and automatically include in all log lines via logger mixin
  4. Never pass trace context as function arguments — use AsyncLocalStorage to avoid polluting function signatures
  5. For logs to traces: configure derived fields in Grafana Loki or field mappings in Datadog to extract traceId and generate trace links
  6. Export traces, metrics, and logs in OpenTelemetry OTLP JSON format for platform-agnostic observability
  7. Encode OTLP JSON nanosecond timestamps as strings, not numbers, to preserve 64-bit precision (JavaScript Number.MAX_SAFE_INTEGER is only 53 bits)
  8. Use OTLP JSON metrics over Prometheus text format when exemplar support is needed to link metrics to traces
  9. Correlate metrics to traces using exemplars — when Grafana shows a spike, click an exemplar to jump to the distributed trace
  10. Enforce structured log schema via compile-time type checking — not documentation that teams may ignore

Structured Logging Schema

Every log line should include mandatory fields: timestamp (ISO 8601), level (info, warn, error), service (which backend service emitted this), environment (production, staging, development), and crucially traceId and spanId. Optional fields carry request-specific context (orderId, userId, endpoint) or operation-specific data (latencyMs, resultCount, cacheMiss). Use a shared logger package that enforces the schema at the type level, preventing incorrect field names at compile time.

W3C Trace Context Propagation

The W3C Trace Context standard defines the traceparent HTTP header format as the mechanism for propagating trace context across service boundaries. The format is version-traceId-parentId-flags, where traceId is a 16-byte (128-bit) hex string, parentId is the parent span ID, and flags encode sampling and other metadata. OpenTelemetry handles traceparent serialization and parsing automatically. Every service that receives a request with a traceparent header should extract it, create a new span as a child of the parent span, and propagate the same traceId downstream.

Injecting Trace Context into Logs

The critical step for log-to-trace correlation is injecting traceId and spanId into every log line. The challenge: trace context is created by the OpenTelemetry SDK in middleware, but log calls happen throughout the request handler. Passing context manually as function arguments pollutes every signature. The solution is AsyncLocalStorage — Node.js's built-in mechanism for propagating context across async boundaries without passing it explicitly. Create a tracing-context module that exports AsyncLocalStorage, populate it in middleware, and read it in a logger mixin to automatically inject trace context.

Log-to-Trace Linking in Grafana

In Grafana Loki, configure a "Derived Fields" rule that extracts the traceId JSON field and generates a link to Grafana Tempo. When viewing logs in Explore, every log line shows a clickable button to jump to the distributed trace. Similarly, in Datadog, use the dd-trace library with specific field names (dd.trace_id and dd.span_id) for automatic trace correlation. In Elastic, use trace.id and span.id field names for ECS (Elastic Common Schema) compatibility.

OpenTelemetry OTLP JSON Export Format

OpenTelemetry Protocol (OTLP) defines the wire format for exporting traces, metrics, and logs. OTLP supports three encodings: Protobuf (binary, most efficient for production), JSON (human-readable, ideal for debugging), and gRPC. The OTLP JSON format uses a hierarchical structure: resourceSpans contains scope spans, which contain spans, metrics, and logs. Each span has startTimeUnixNano and endTimeUnixNano in nanoseconds, encoded as strings (not numbers) to preserve 64-bit precision.

Metrics Exemplars

An exemplar is a sampled metric data point that carries traceId and spanId alongside the metric value. When Grafana displays a histogram with a spike in the 500ms bucket, clicking an exemplar jumps directly to the distributed trace that produced that slow request — no searching by time range or service name. Both OTLP JSON metrics and Prometheus OpenMetrics format support exemplars. This is the key differentiator from Prometheus text format.

OTLP JSON vs Prometheus Text Format

Prometheus text format is human-readable and lightweight but does not support exemplars or structured metadata. OTLP JSON is more verbose but supports exemplars, custom attributes, and resource metadata. For observability pipelines where linking metrics to traces is important, choose OTLP JSON. For simple metric scraping, Prometheus text format is sufficient.

Complete Implementation Stack

A complete Node.js observability implementation combines pino for structured JSON logging, [var:@opentelemetry/sdk-node] for distributed tracing and metrics, AsyncLocalStorage for trace context propagation, and a log shipper (pino-loki or Fluent Bit) for aggregation. The result: every log line carries trace context, every trace has correlated logs, and every slow metric has an exemplar linking to the trace.

Observable vs Non-Observable APIs

An observable API emits structured logs with traceId correlation, exports OpenTelemetry traces with full span timings, and exposes metrics with exemplars. A non-observable API might log to stdout with no trace correlation, making it impossible to answer "which database query made this slow?" during incident response. Observable APIs are built, not added later — observability needs to be designed in from the start.

Further reading and primary sources