JSON Monitoring: Prometheus Alerts, Grafana Dashboards, and Health Checks

Q: What JSON format should a health check endpoint return?

A health check endpoint should return a JSON object with a "status" field set to "healthy", "degraded", or "unhealthy". The nested "checks" object contains individual component statuses. Example: {"status":"healthy","version":"1.2.3","uptime_seconds":86400,"checks":{"database":{"status":"up","latency_ms":5},"cache":{"status":"up","latency_ms":1}}}. Use "degraded" when non-critical checks fail but the service can still serve traffic. Return HTTP 200 for healthy/degraded and HTTP 503 for unhealthy. The /health endpoint checks liveness; /ready checks readiness (returns 503 until the service is ready to accept traffic).

Q: What JSON does Prometheus Alertmanager send to webhooks?

Prometheus Alertmanager sends a POST request to your webhook URL with a JSON body containing: {"version":"4","groupKey":"...","status":"firing","receiver":"webhook","groupLabels":{"alertname":"HighErrorRate"},"commonLabels":{"alertname":"HighErrorRate","severity":"critical"},"commonAnnotations":{"description":"Error rate above 5%"},"externalURL":"http://alertmanager:9093","alerts":[{"labels":{"alertname":"HighErrorRate","severity":"critical","job":"api-server"},"annotations":{"description":"Error rate is 8.3%","runbook_url":"https://runbooks.example.com/high-error-rate"},"startsAt":"2026-05-19T10:00:00Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus:9090/...","fingerprint":"abc123"}]}. The "startsAt" and "endsAt" fields are RFC 3339 timestamps — parse with new Date(alert.startsAt). An "endsAt" of "0001-01-01T00:00:00Z" means the alert is still firing.

Q: How do I import/export Grafana dashboards as JSON?

To export a Grafana dashboard as JSON: open the dashboard, click the Share icon (or the dashboard settings gear), select Export, and click Save to file. The exported JSON contains the full dashboard definition including panels, datasource references, templating variables, and time range. To import: click the + (Create) button in the sidebar, select Import, and either paste the JSON or upload the file. The "uid" field in the JSON must be unique across your Grafana instance — set it manually for reproducible imports (e.g. in GitOps workflows). Remove the "id" field before importing to a new instance to avoid conflicts. Use the Grafana HTTP API: POST /api/dashboards/import with {"dashboard": , "overwrite": true, "folderId": 0} to automate imports.

Q: What is the PagerDuty Events API v2 JSON format?

The PagerDuty Events API v2 JSON format requires: {"routing_key":" ","event_action":"trigger","dedup_key":"optional-unique-key","payload":{"summary":"High error rate on api-server","severity":"critical","source":"api-server-prod","timestamp":"2026-05-19T10:00:00Z","component":"api","group":"production","class":"error_rate","custom_details":{"error_rate":"8.3%","threshold":"5%"}}}. The "severity" field must be one of "critical", "error", "warning", or "info" — lowercase only. The "dedup_key" prevents duplicate incidents; use a consistent key (e.g. alert fingerprint) to deduplicate. Send "event_action":"resolve" with the same "dedup_key" to auto-resolve the incident. POST to https://events.pagerduty.com/v2/enqueue.

Q: How do I submit metrics to Datadog using JSON?

Submit metrics to the Datadog API using POST https://api.datadoghq.com/api/v1/series with Content-Type: application/json and a DD-API-KEY header. The JSON body uses a "series" array: {"series":[{"metric":"app.error.rate","type":"gauge","points":[[1716124800,8.3]],"tags":["env:production","service:api-server"],"host":"api-server-01"},{"metric":"app.request.count","type":"count","points":[[1716124800,1523]],"tags":["env:production"]}]}. Each point is a two-element array [unix_timestamp_seconds, value]. The "type" field is "gauge" (instantaneous value), "count" (delta count), or "rate" (per-second rate). The Datadog Agent also accepts JSON via its local API on port 8125 (DogStatsD) or the /api/v1/series endpoint directly.

Q: How do I implement a health check endpoint in Node.js?

Implement a health check endpoint in Node.js (Express): app.get("/health", async (req, res) => { const dbLatency = await checkDatabase(); const cacheLatency = await checkCache(); const status = (dbLatency === null) ? "unhealthy" : (cacheLatency === null) ? "degraded" : "healthy"; const code = status === "unhealthy" ? 503 : 200; res.status(code).json({ status, version: process.env.npm_package_version, uptime_seconds: Math.floor(process.uptime()), checks: { database: { status: dbLatency !== null ? "up" : "down", latency_ms: dbLatency }, cache: { status: cacheLatency !== null ? "up" : "down", latency_ms: cacheLatency } } }); }). Add a /ready endpoint that returns 503 until database migrations and cache warmup complete. Configure Kubernetes liveness probes to hit /health and readiness probes to hit /ready.

Q: What JSON fields are required for a Prometheus webhook receiver?

A Prometheus Alertmanager webhook receiver requires your endpoint to accept POST requests with Content-Type: application/json. The required top-level fields in the incoming JSON are: "version" (always "4"), "groupKey" (string identifying the alert group), "status" ("firing" or "resolved"), "receiver" (the receiver name from your Alertmanager config), "groupLabels" (labels used for grouping), "commonLabels" (labels shared by all alerts), "commonAnnotations" (annotations shared by all alerts), "externalURL" (Alertmanager URL), and "alerts" (array of alert objects). Each alert object contains "labels", "annotations", "startsAt" (RFC 3339), "endsAt" (RFC 3339, "0001-01-01T00:00:00Z" if still firing), "generatorURL", and "fingerprint". Your webhook must return HTTP 200 to acknowledge receipt.

Q: How do I query logs with JSON in AWS CloudWatch?

CloudWatch Logs Insights supports JSON log querying with dot notation. To query JSON fields: fields @timestamp, level, message, requestId | filter level = "ERROR" | sort @timestamp desc | limit 100. For nested JSON fields: filter http.statusCode >= 500 | stats count() by http.method. To create a CloudWatch Metric Filter for JSON logs, use a filter pattern like { $.level = "ERROR" } — the $.fieldName syntax accesses JSON log fields. CloudWatch Alarms trigger on metric filter data; the alarm JSON config (from CloudFormation or the CLI) includes: {"AlarmName":"HighErrorRate","MetricName":"ErrorCount","Namespace":"LogMetrics","Period":300,"EvaluationPeriods":2,"Threshold":10,"ComparisonOperator":"GreaterThanThreshold","TreatMissingData":"notBreaching"}.

Last updated: May 19, 2026

JSON is the configuration and data format for modern monitoring stacks — Prometheus alerting rules export as JSON, Grafana dashboards are JSON files, and health check endpoints return JSON status objects. A standard health check JSON response looks like: {"status":"healthy","version":"1.2.3","checks":{"database":{"status":"up","latency_ms":5},"cache":{"status":"up"}}} — use "degraded" not "healthy" when non-critical checks fail. Prometheus Alertmanager webhook receiver expects JSON: {"alerts":[{"labels":{"alertname":"HighErrorRate","severity":"critical"},"annotations":{"description":"..."}}]}. This guide covers health check JSON schema, Prometheus alert webhook JSON, Grafana dashboard JSON import/export, PagerDuty Events API v2 JSON, Datadog JSON metric submission, and structured alerting with JSON runbooks. Use Jsonic's JSON formatter to validate and pretty-print monitoring payloads. For related observability patterns, see OpenTelemetry JSON and JSON structured logging.

Need to validate a health check or Alertmanager webhook payload? Paste it into Jsonic and inspect the structure instantly.

Open JSON Formatter

Health Check JSON Schema

A health check endpoint is the simplest monitoring primitive — a GET request that returns a JSON object describing service status. The convention is /health for liveness (is the process running?) and /ready for readiness (can the service serve traffic?). The response JSON should include a top-level status field with one of three values: "healthy", "degraded", or "unhealthy". Use "degraded"when non-critical dependencies (e.g. a cache) are down but the service can still function — this lets load balancers keep the instance in rotation while triggering a lower-severity alert. Include version, uptime_seconds, and a nested checks object with per-component status and latency.

// Standard health check JSON response — GET /health
{
  "status": "healthy",          // "healthy" | "degraded" | "unhealthy"
  "version": "1.2.3",
  "uptime_seconds": 86400,
  "timestamp": "2026-05-19T10:00:00Z",
  "checks": {
    "database": {
      "status": "up",           // "up" | "down" | "degraded"
      "latency_ms": 5,
      "message": null
    },
    "cache": {
      "status": "up",
      "latency_ms": 1,
      "message": null
    },
    "queue": {
      "status": "degraded",
      "latency_ms": 450,
      "message": "High consumer lag: 12000 messages"
    }
  }
}

// Unhealthy — primary database unreachable, return HTTP 503
{
  "status": "unhealthy",
  "version": "1.2.3",
  "uptime_seconds": 3600,
  "timestamp": "2026-05-19T10:05:00Z",
  "checks": {
    "database": {
      "status": "down",
      "latency_ms": null,
      "message": "Connection refused: ECONNREFUSED 5432"
    },
    "cache": {
      "status": "up",
      "latency_ms": 1,
      "message": null
    }
  }
}

Return HTTP 200 for both "healthy" and "degraded" — some load balancers treat any non-2xx as a failed health check and remove the instance from rotation, which is too aggressive for degraded state. Return HTTP 503 only for"unhealthy". Kubernetes liveness probes should hit /health; readiness probes should hit /ready, which returns 503 until startup tasks (database migrations, cache warmup) complete. Include latency_mson each check — spikes in this value reveal slow dependencies before they cause failures. For error handling patterns in JSON APIs, see JSON error handling.

Prometheus Alertmanager Webhook JSON

Prometheus Alertmanager sends alert notifications to webhook receivers as HTTP POST requests with a JSON body. The payload contains an alerts array — each alert includes labels (identifying metadata), annotations (human-readable context), and RFC 3339 timestamps. Understanding this format is essential for building custom notification handlers, Slack bots, auto-remediation systems, or incident management integrations.

// Alertmanager webhook JSON — version 4 payload
{
  "version": "4",
  "groupKey": "{}:{alertname="HighErrorRate"}",
  "status": "firing",           // "firing" | "resolved"
  "receiver": "webhook-handler",
  "groupLabels": {
    "alertname": "HighErrorRate"
  },
  "commonLabels": {
    "alertname": "HighErrorRate",
    "env":       "production",
    "severity":  "critical"
  },
  "commonAnnotations": {
    "description": "Error rate above threshold for 5 minutes",
    "runbook_url": "https://runbooks.example.com/high-error-rate"
  },
  "externalURL": "http://alertmanager.internal:9093",
  "alerts": [
    {
      "labels": {
        "alertname": "HighErrorRate",
        "env":       "production",
        "job":       "api-server",
        "severity":  "critical"
      },
      "annotations": {
        "description":  "Error rate is 8.3% (threshold: 5%)",
        "runbook_url":  "https://runbooks.example.com/high-error-rate",
        "dashboard_url": "https://grafana.example.com/d/abc123"
      },
      "startsAt":    "2026-05-19T10:00:00Z",
      "endsAt":      "0001-01-01T00:00:00Z",   // zero time = still firing
      "generatorURL": "http://prometheus:9090/graph?...",
      "fingerprint": "abc123def456"
    }
  ]
}

// Node.js webhook receiver — Express handler
import express from 'express'

const app = express()
app.use(express.json())

app.post('/webhook/alerts', (req, res) => {
  const { status, alerts, commonLabels } = req.body

  for (const alert of alerts) {
    const isFiring   = status === 'firing'
    const isResolved = status === 'resolved'
    const severity   = alert.labels.severity       // "critical" | "warning" | "info"
    const startTime  = new Date(alert.startsAt)    // RFC 3339 → Date
    const endTime    = new Date(alert.endsAt)
    const stillFiring = endTime.getFullYear() === 1 // zero year = still firing

    console.log(JSON.stringify({
      level:    isFiring ? 'alert' : 'resolved',
      name:     alert.labels.alertname,
      severity,
      job:      alert.labels.job,
      description: alert.annotations.description,
      runbook:  alert.annotations.runbook_url,
      startedAt: startTime.toISOString(),
      fingerprint: alert.fingerprint,
    }))

    // Route to appropriate handler by severity
    if (isFiring && severity === 'critical') {
      triggerPagerDuty(alert)
    } else if (isFiring && severity === 'warning') {
      sendSlackAlert(alert)
    }
  }

  res.status(200).json({ status: 'ok' })
})

Configure the receiver in alertmanager.yml: receivers: [{name: "webhook-handler", webhook_configs: [{url: "http://handler:8080/webhook/alerts", send_resolved: true}]}]. Set send_resolved: true to receive a second notification when the alert resolves — the status field changes to "resolved" and endsAt is set to the resolution time. For structured log output from your webhook handler, see JSON structured logging.

Grafana Dashboard JSON Structure

Grafana stores every dashboard as a JSON document. Exporting and importing dashboards as JSON enables version control, GitOps workflows, and reproducible deployments across environments. The dashboard JSON has a fixed top-level structure: metadata fields, a panels array, templating (variables), time range defaults, and a unique uid that identifies the dashboard across Grafana instances.

// Grafana dashboard JSON — simplified structure
{
  "uid":         "api-server-overview",   // Must be unique; set manually for GitOps
  "title":       "API Server Overview",
  "description": "HTTP request rates, error rates, and latency",
  "tags":        ["api", "production"],
  "timezone":    "browser",
  "schemaVersion": 38,
  "version":     1,
  "refresh":     "30s",

  // Default time range shown when dashboard opens
  "time": { "from": "now-1h", "to": "now" },

  // Template variables — appear as dropdowns at the top of the dashboard
  "templating": {
    "list": [
      {
        "name":       "env",
        "type":       "custom",
        "label":      "Environment",
        "query":      "production,staging,development",
        "current":    { "value": "production", "text": "production" },
        "options":    [
          { "value": "production", "text": "production", "selected": true },
          { "value": "staging",    "text": "staging",    "selected": false }
        ]
      }
    ]
  },

  // Panels array — each panel is a chart, stat, table, or text block
  "panels": [
    {
      "id":    1,
      "type":  "timeseries",
      "title": "HTTP Request Rate",
      "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 },
      "datasource": { "type": "prometheus", "uid": "prometheus-ds" },
      "targets": [
        {
          "expr": "rate(http_requests_total{env=\"$env\"}[5m])",
          "legendFormat": "{{job}} {{method}}"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "reqps",
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "green", "value": null },
              { "color": "red",   "value": 1000 }
            ]
          }
        }
      }
    }
  ]
}

# Import dashboard via Grafana HTTP API
curl -X POST https://grafana.example.com/api/dashboards/import \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GRAFANA_API_KEY" \
  -d '{
    "dashboard": '"$(cat dashboard.json)"',
    "overwrite": true,
    "folderId": 0
  }'

# Export dashboard by UID
curl -H "Authorization: Bearer $GRAFANA_API_KEY" \
  https://grafana.example.com/api/dashboards/uid/api-server-overview

The "uid" field is critical for reproducible GitOps workflows — without a stable UID, every import creates a new dashboard instead of updating the existing one. Remove the "id" field (the database auto-increment integer) before importing to a new Grafana instance to avoid primary key conflicts. Dashboard JSON can reference datasources by uid — provision datasource UIDs in Grafana to ensure consistency across environments. For patterns on managing configuration as JSON, see JSON audit logging.

PagerDuty Events API v2 JSON

PagerDuty Events API v2 accepts JSON alert events and creates, acknowledges, or resolves incidents. Every event requires a routing_key (the integration key for the target service), an event_action, and a payloadobject. The dedup_key prevents duplicate incidents — PagerDuty groups events with the same dedup_key into a single incident.

// PagerDuty Events API v2 — trigger an incident
// POST https://events.pagerduty.com/v2/enqueue
{
  "routing_key":  "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",  // 32-char service integration key
  "event_action": "trigger",   // "trigger" | "acknowledge" | "resolve"
  "dedup_key":    "api-server-high-error-rate-prod",     // prevents duplicate incidents

  "payload": {
    "summary":   "High error rate on api-server (production)",
    "severity":  "critical",    // "critical" | "error" | "warning" | "info" — lowercase only
    "source":    "api-server-prod-01",
    "timestamp": "2026-05-19T10:00:00Z",
    "component": "api-server",
    "group":     "production",
    "class":     "error_rate",

    // Arbitrary structured data — appears in the incident details
    "custom_details": {
      "error_rate":   "8.3%",
      "threshold":    "5%",
      "duration_min": 5,
      "runbook_url":  "https://runbooks.example.com/high-error-rate",
      "grafana_url":  "https://grafana.example.com/d/api-server-overview"
    }
  },

  // Optional: attach images or links to the incident
  "images": [
    {
      "src":  "https://grafana.example.com/render/d/api-server-overview?...",
      "href": "https://grafana.example.com/d/api-server-overview",
      "alt":  "Error rate graph"
    }
  ],
  "links": [
    {
      "href": "https://runbooks.example.com/high-error-rate",
      "text": "Runbook: High Error Rate"
    }
  ]
}

// Resolve the incident when alert clears
// POST https://events.pagerduty.com/v2/enqueue
{
  "routing_key":  "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",
  "event_action": "resolve",
  "dedup_key":    "api-server-high-error-rate-prod",
  "payload": {
    "summary":  "High error rate resolved on api-server (production)",
    "severity": "info",
    "source":   "api-server-prod-01"
  }
}

// Node.js helper to send PagerDuty event
async function sendPagerDutyEvent(event) {
  const response = await fetch('https://events.pagerduty.com/v2/enqueue', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(event),
  })
  const result = await response.json()
  // result: { "status": "success", "message": "Event processed", "dedup_key": "..." }
  return result
}

The dedup_keyis the key to idempotent alerting — use a stable identifier derived from the alert fingerprint (e.g. Alertmanager's fingerprint field) so that repeated firings of the same alert do not create multiple incidents. The custom_details object is free-form JSON and appears in the incident timeline — include runbook URLs, metric values, and relevant context to reduce time-to-resolution. For event-driven patterns, see OpenTelemetry JSON.

Datadog JSON Metric Submission

The Datadog metrics API accepts JSON payloads via POST to submit custom metrics from any application or script. The payload uses a series array where each element defines a metric name, type, data points (timestamp + value pairs), tags, and host. This is the direct alternative to the DogStatsD UDP protocol — useful for batch metric submission, serverless environments, or languages without a DogStatsD client library.

// Datadog metrics API — POST https://api.datadoghq.com/api/v1/series
// Header: DD-API-KEY: <your-api-key>
// Header: Content-Type: application/json
{
  "series": [

    // GAUGE — point-in-time value (e.g. queue depth, active connections)
    {
      "metric": "app.queue.depth",
      "type":   "gauge",
      "points": [
        [1716124800, 243],    // [unix_timestamp_seconds, value]
        [1716124860, 198]     // submit multiple points per series call
      ],
      "tags":   ["env:production", "service:worker", "queue:orders"],
      "host":   "worker-01.prod.example.com",
      "unit":   "message"
    },

    // COUNT — delta count since last submission (e.g. requests handled)
    {
      "metric": "app.requests.count",
      "type":   "count",
      "points": [[1716124800, 1523]],
      "tags":   ["env:production", "service:api-server", "method:POST", "status:200"]
    },

    // RATE — per-second rate (Datadog computes rate from count / interval)
    {
      "metric": "app.error.rate",
      "type":   "rate",
      "points": [[1716124800, 8.3]],
      "tags":   ["env:production", "service:api-server"],
      "interval": 60    // required for rate type: submission interval in seconds
    }

  ]
}

// Node.js — submit metrics to Datadog
async function submitMetrics(metrics) {
  const now = Math.floor(Date.now() / 1000)  // Unix timestamp in seconds
  const series = metrics.map(({ metric, type, value, tags }) => ({
    metric,
    type,
    points: [[now, value]],
    tags: ['env:production', ...tags],
    host: process.env.HOSTNAME,
  }))

  const response = await fetch('https://api.datadoghq.com/api/v1/series', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'DD-API-KEY': process.env.DD_API_KEY,
    },
    body: JSON.stringify({ series }),
  })
  // Success: HTTP 202 Accepted
  // {"status":"ok"}
  return response.json()
}

// Usage
submitMetrics([
  { metric: 'app.queue.depth',    type: 'gauge', value: 243,  tags: ['queue:orders'] },
  { metric: 'app.requests.count', type: 'count', value: 1523, tags: ['method:POST'] },
])

Each data point is a two-element array [timestamp, value] wheretimestamp is Unix epoch in seconds (not milliseconds — divide Date.now() by 1000). Tags follow the key:value format and enable filtering and grouping in Datadog dashboards and monitors. Use "gauge" for values that can go up or down, "count" for deltas (Datadog accumulates counts server-side), and "rate" when you want Datadog to normalize a count by the submission interval. For structured event data alongside metrics, see JSON structured logging.

Structured Alerting with JSON Runbooks

A JSON runbook links an alert to machine-readable remediation steps. Instead of a plain URL in the alert annotation, a JSON runbook describes diagnostic commands, escalation policies, and auto-remediation triggers in a structured format that both humans and automation can consume. This enables on-call engineers to access context-specific guidance and lets auto-remediation systems execute predefined actions when specific alert conditions fire.

// JSON runbook — stored as a file or served from an API endpoint
// Referenced in Alertmanager alert rule annotation: runbook_url: https://runbooks.example.com/high-error-rate.json
{
  "id":          "high-error-rate",
  "title":       "High Error Rate on API Server",
  "version":     "2.0",
  "updated_at":  "2026-05-19T00:00:00Z",
  "severity":    "critical",
  "team":        "platform",

  "triage": {
    "description": "Error rate exceeds 5% for 5+ minutes on the API server",
    "likely_causes": [
      "Upstream database latency spike",
      "Bad deployment — rollback candidate",
      "Downstream dependency timeout"
    ],
    "diagnostic_commands": [
      {
        "step":        1,
        "description": "Check recent error logs",
        "command":     "kubectl logs -n production -l app=api-server --since=10m | grep ERROR | tail -50",
        "expected":    "Repeated error pattern (e.g. ECONNREFUSED, timeout)"
      },
      {
        "step":        2,
        "description": "Check database latency",
        "command":     "kubectl exec -n production deploy/api-server -- curl -s http://localhost:8080/health | jq .checks.database",
        "expected":    "latency_ms < 50; if null, database is unreachable"
      }
    ]
  },

  "remediation": {
    "auto_remediation": {
      "enabled":     true,
      "conditions":  ["error_rate > 20%", "duration_min > 2"],
      "action":      "rollback_deployment",
      "parameters":  { "namespace": "production", "deployment": "api-server", "revisions": 1 }
    },
    "manual_steps": [
      "If database unreachable: page database on-call via PagerDuty escalation policy db-team",
      "If bad deployment: kubectl rollout undo deployment/api-server -n production"
    ]
  },

  "escalation": {
    "after_minutes": 15,
    "policy":        "platform-oncall",
    "notify":        ["#incidents-prod"]
  }
}

// Auto-remediation webhook handler — Node.js
app.post('/webhook/alerts', async (req, res) => {
  for (const alert of req.body.alerts) {
    const runbookUrl = alert.annotations.runbook_url
    if (!runbookUrl) continue

    // Fetch the structured JSON runbook
    const runbook = await fetch(runbookUrl).then(r => r.json())
    const auto = runbook.remediation?.auto_remediation

    if (auto?.enabled && req.body.status === 'firing') {
      // Evaluate conditions and trigger auto-remediation
      const errorRate = parseFloat(alert.annotations.error_rate)
      if (errorRate > 20) {
        await triggerRemediation(auto.action, auto.parameters)
        await auditLog({ action: auto.action, alert: alert.labels.alertname, errorRate })
      }
    }
  }
  res.status(200).json({ status: 'ok' })
})

Store JSON runbooks in a Git repository and reference them by URL in Prometheus alert rule annotations. Version the runbook JSON with a "version" field and track changes via Git history — this provides an audit trail of remediation procedure changes correlated with incident history. The auto_remediation.conditions array can be evaluated by the webhook handler to make automatic rollback decisions safe and auditable. For audit trail patterns, see JSON audit logging.

JSON Log Alerting with CloudWatch

AWS CloudWatch Logs natively parses JSON log entries, enabling metric filters and Logs Insights queries that reference JSON fields directly. When your application emits structured JSON logs, CloudWatch can extract numeric fields into custom metrics and trigger alarms without any log parsing Lambda or agent-side processing. CloudWatch Logs Insights supports dot-notation queries on nested JSON fields.

// Structured JSON log line — emitted by your application
{
  "level":      "ERROR",
  "message":    "Database query timeout",
  "requestId":  "req-abc123",
  "timestamp":  "2026-05-19T10:00:00Z",
  "http": {
    "method":     "POST",
    "path":       "/api/orders",
    "statusCode": 500,
    "duration_ms": 5001
  },
  "error": {
    "type":    "QueryTimeout",
    "message": "Query exceeded 5000ms timeout",
    "query":   "SELECT * FROM orders WHERE..."
  }
}

# CloudWatch Logs Insights query — JSON field access with dot notation
fields @timestamp, level, message, requestId, http.statusCode, http.duration_ms
| filter level = "ERROR"
| filter http.statusCode >= 500
| sort @timestamp desc
| limit 100

# Count errors by HTTP path (last 1 hour)
fields http.path, http.statusCode
| filter level = "ERROR"
| stats count() as error_count by http.path
| sort error_count desc

# P99 latency by path
fields http.path, http.duration_ms
| filter ispresent(http.duration_ms)
| stats pct(http.duration_ms, 99) as p99_ms by http.path
| sort p99_ms desc

// CloudWatch Metric Filter — extracts error count from JSON logs
// AWS CLI: create metric filter for JSON field $.level == "ERROR"
aws logs put-metric-filter \
  --log-group-name "/production/api-server" \
  --filter-name "ErrorCount" \
  --filter-pattern '{ $.level = "ERROR" }' \
  --metric-transformations \
    metricName=ErrorCount,metricNamespace=ApiServer,metricValue=1,defaultValue=0

// CloudFormation alarm JSON (equivalent structure for Terraform/CDK)
{
  "Type": "AWS::CloudWatch::Alarm",
  "Properties": {
    "AlarmName":          "ApiServer-HighErrorRate",
    "AlarmDescription":   "API server error rate exceeds threshold",
    "MetricName":         "ErrorCount",
    "Namespace":          "ApiServer",
    "Statistic":          "Sum",
    "Period":             300,
    "EvaluationPeriods":  2,
    "Threshold":          10,
    "ComparisonOperator": "GreaterThanThreshold",
    "TreatMissingData":   "notBreaching",
    "AlarmActions":       ["arn:aws:sns:us-east-1:123456789:alerts-topic"],
    "OKActions":          ["arn:aws:sns:us-east-1:123456789:alerts-topic"]
  }
}

CloudWatch metric filter patterns use the $.fieldName syntax to access top-level JSON fields and $.nested.field for nested fields. Numeric comparisons work with { $.http.statusCode >= 500 }. The metric filter creates a CloudWatch metric from matching log lines — combine it with a CloudWatch Alarm to trigger SNS notifications, which can fan out to email, Slack (via Lambda), or PagerDuty. For structured logging patterns that work well with CloudWatch, see JSON structured logging.

Definitions

Health check: An HTTP endpoint (/health or /ready) that returns a JSON object describing service status. Used by load balancers, orchestrators (Kubernetes), and uptime monitors to determine whether a service instance should receive traffic. Returns HTTP 200 for healthy/degraded states and HTTP 503 for unhealthy.
Alert webhook: An HTTP POST endpoint that receives JSON alert notifications from a monitoring system such as Prometheus Alertmanager. The webhook handler processes the alert JSON and routes notifications to PagerDuty, Slack, email, or custom automation. Alertmanager sends webhook version 4 JSON with an alerts array.
Dashboard JSON: The complete JSON representation of a Grafana dashboard, including panels, queries, datasource references, templating variables, time range, and visual configuration. Grafana dashboards can be exported as JSON for version control and imported via the UI or HTTP API. The "uid" field uniquely identifies the dashboard.
Routing key: A 32-character integration key used by the PagerDuty Events API v2 to route incoming events to the correct PagerDuty service. Each PagerDuty service has one or more integration keys — the routing_key field in the JSON payload determines which service receives the event and which escalation policy is applied.
Dedup key: A string field in PagerDuty Events API v2 that prevents duplicate incidents. When multiple events share the same dedup_key, PagerDuty groups them into a single incident. Sending a "resolve" event with the same dedup_key closes the incident. Use a stable identifier derived from the alert fingerprint or alert name + environment.
Metric series: A named time series submitted to a metrics backend (Datadog, Prometheus, CloudWatch). In the Datadog JSON API, a series object contains a metric name, type (gauge, count, rate), an array of [timestamp, value] data points, tags, and host. Multiple series can be submitted in a single API call using the series array.
Log metric filter: A CloudWatch Logs feature that scans log lines for a JSON field pattern and increments a custom CloudWatch metric when a match is found. Metric filters use the { $.fieldName = "value" } syntax for JSON logs. The resulting metric can trigger CloudWatch Alarms for automated alerting without custom parsing infrastructure.

Frequently asked questions

What JSON format should a health check endpoint return?

A health check endpoint should return a JSON object with a status field set to "healthy", "degraded", or "unhealthy". Include a nested checks object with per-component status and latency_ms values: {"status":"healthy","version":"1.2.3","checks":{"database":{"status":"up","latency_ms":5},"cache":{"status":"up"}}}. Use "degraded" when non-critical checks fail but the service can still serve traffic — return HTTP 200 for both healthy and degraded, and HTTP 503 only for unhealthy. The /health endpoint checks liveness; /ready checks readiness and returns 503 until the service is ready to accept traffic. Always include version and uptime_seconds to aid debugging.

What JSON does Prometheus Alertmanager send to webhooks?

Alertmanager sends a POST request with Content-Type: application/json containing: version (always "4"), status ("firing" or "resolved"), groupLabels, commonLabels, commonAnnotations, and an alerts array. Each alert has labels, annotations, startsAt (RFC 3339), endsAt (RFC 3339 — "0001-01-01T00:00:00Z" zero time means still firing), and fingerprint. Parse timestamps with new Date(alert.startsAt). Configure in Alertmanager with webhook_configs and set send_resolved: true to receive resolution notifications.

How do I import/export Grafana dashboards as JSON?

To export: open the dashboard, click the Share icon or dashboard settings gear, select Export, then Save to file. The JSON contains panels, datasource refs, templating variables, and time range settings. To import: click the + button in the sidebar, select Import, and paste or upload the JSON. Via the HTTP API: POST /api/dashboards/import with body {"dashboard": <json>, "overwrite": true, "folderId": 0}. Set the "uid" field manually for reproducible GitOps imports — without a stable UID, every import creates a new dashboard. Remove the "id" field before importing to a new instance to avoid auto-increment conflicts.

What is the PagerDuty Events API v2 JSON format?

POST to https://events.pagerduty.com/v2/enqueue with: routing_key (32-char service integration key), event_action ("trigger", "acknowledge", or "resolve"), dedup_key (deduplication string), and payload containing summary, severity ("critical", "error", "warning", or "info" — lowercase only), source, and optional custom_details object. Send a "resolve" event with the same dedup_key to auto-close the incident. Use images and links arrays to attach Grafana screenshots and runbook URLs.

How do I submit metrics to Datadog using JSON?

POST to https://api.datadoghq.com/api/v1/series with headers DD-API-KEY: <key> and Content-Type: application/json. The body uses a series array — each item has metric (name), type ("gauge", "count", or "rate"), points (array of [unix_seconds, value] pairs), tags (["env:production"]), and host. Use Math.floor(Date.now() / 1000) for the timestamp — Datadog uses Unix seconds, not milliseconds. A successful response returns HTTP 202 with {"status":"ok"}.

How do I implement a health check endpoint in Node.js?

Use Express to add a /health endpoint that checks each dependency and returns the appropriate JSON status. Check database connectivity with a lightweight query (e.g. SELECT 1) and measure its latency. Set status: "degraded" if non-critical checks (cache, queue) fail, and status: "unhealthy" if critical checks (database) fail. Return res.status(503) only for unhealthy. Add a separate /ready endpoint that returns 503 until startup tasks complete. Configure Kubernetes liveness probes on /health and readiness probes on /ready. Set a short timeout (500ms) on each dependency check to prevent health checks from hanging indefinitely.

What JSON fields are required for a Prometheus webhook receiver?

Your webhook endpoint must accept POST with Content-Type: application/json. The incoming JSON always includes: version ("4"), groupKey, status ("firing" or "resolved"), receiver, groupLabels, commonLabels, commonAnnotations, externalURL, and alerts array. Each alert object contains labels, annotations, startsAt, endsAt, generatorURL, and fingerprint. Your handler must return HTTP 200 to acknowledge receipt — Alertmanager retries on non-2xx responses. Use the fingerprint field as a stable dedup_key when routing to PagerDuty.

How do I query logs with JSON in AWS CloudWatch?

CloudWatch Logs Insights supports dot-notation for JSON field access. To filter on a JSON field: filter level = "ERROR" or filter http.statusCode >= 500. To aggregate: stats count() by http.path. For metric filters, use the CloudWatch filter pattern syntax: { $.level = "ERROR" } — the $.fieldName syntax accesses top-level JSON fields. Create a metric filter with aws logs put-metric-filter and set a CloudWatch Alarm on the resulting metric. Use TreatMissingData: notBreaching to avoid false alarms during low-traffic periods.

Ready to validate your monitoring JSON payloads?

Paste any health check response, Alertmanager webhook payload, Grafana dashboard JSON, or PagerDuty event into Jsonic to format, validate, and inspect the structure before sending to production.