JSON A/B Testing: Experiment Config, Assignment API & Event Tracking

Q: What should a JSON experiment configuration schema include?

A well-designed JSON experiment configuration schema should include: experimentId (a globally unique identifier, typically a UUID or slug like "checkout-cta-v2"), name and description for human readability, a variants array where each variant has id, name, weight (integer 0–100, all weights must sum to 100), and a config object containing the feature-specific parameters that differ between variants. It should also include a metrics array listing the primary and secondary metrics to track (e.g., "conversion_rate", "revenue_per_user"), a targeting object with audience conditions (e.g., userCountry, deviceType, percentTraffic to limit exposure), and startDate/endDate in ISO 8601 format. Optional but recommended: status ("draft" | "running" | "paused" | "completed"), salt (a random string used in consistent hashing to ensure experiment independence from other concurrent experiments), minimumSampleSize per variant before results are trusted, and createdBy/createdAt for audit purposes. The config object within each variant is intentionally open-ended and holds whatever parameters the experiment is testing — button text, price, algorithm flag, etc. This separation of experiment metadata from variant config is the key design pattern: the assignment service reads the experiment metadata; the application reads variant.config to know what to render.

Q: How does consistent hashing ensure users always see the same A/B test variant?

Consistent hashing works by combining the userId and experimentId (plus a salt) into a deterministic hash, then mapping that hash to a bucket from 0–99 that corresponds to a variant. The formula is: bucket = hash(userId + experimentId + salt) % 100. Because the hash function is deterministic, the same userId always produces the same bucket for a given experiment. Variants are assigned contiguous bucket ranges according to their weights — e.g., control gets buckets 0–49, variant-A gets buckets 50–79, variant-B gets buckets 80–99 for a 50/30/20 split. The salt is critical: without it, a user assigned to "control" in experiment-1 would be assigned to "control" in every other experiment (since hash(userId + expId1) and hash(userId + expId2) would be correlated if salts are omitted). Each experiment should have a unique, randomly generated salt so that assignments are statistically independent across experiments. This is especially important when running multiple overlapping experiments. The hash function itself should be fast (MurmurHash3, xxHash) and have good avalanche properties — small changes to input produce completely different outputs. The assignment result is then cached in a cookie or user profile so the system does not re-compute it on every request, ensuring users never switch variants mid-session.

Q: What JSON fields should an A/B test event tracking payload include?

An A/B test event tracking payload must include the fields needed to join event data back to experiment results. Required fields: type (the event name, e.g., "conversion", "page_view", "add_to_cart"), experimentId (which experiment generated the exposure), variantId (which variant the user was assigned to), userId (stable identifier — anonymous ID for logged-out users, account ID for logged-in), and timestamp in ISO 8601 format with millisecond precision. Recommended fields: sessionId (to group events within a session), a properties object (open-ended key-value store for event-specific data — e.g., revenue amount for a purchase event, product ID for an add-to-cart event, page URL for a page view), and clientTimestamp vs serverTimestamp (the client timestamp can drift, so the server should record its own timestamp on receipt while preserving the client-reported time for latency analysis). The event type "exposure" is special — it should be tracked the moment the user is shown the experiment variant, not just on conversion. Without exposure events, you cannot compute the denominator for conversion rate. Batch events client-side for 5 seconds or up to 100 events before sending a POST array to the tracking API, to avoid a network request per event. Each event in the batch should be a complete, self-contained object so the server can process events individually even if the batch arrives out of order.

Q: How do I structure A/B test results in JSON for statistical significance?

A/B test results JSON should be structured at two levels: an experiment-level summary and a per-variant breakdown. At the experiment level: experimentId, status, startDate, endDate, primaryMetric, totalUsers, and a winner field (null while running, set to variantId when the experiment concludes). For each variant in the results array: variantId, sampleSize (number of unique users exposed), conversions (users who completed the primary metric), conversionRate (conversions / sampleSize as a decimal), and a statistics object containing confidenceInterval (an array [lower, upper] for the conversion rate at 95% confidence), pValue (the probability of observing this result by chance under the null hypothesis — below 0.05 is conventionally significant), lift (the relative improvement over control as a decimal, e.g., 0.12 for 12% uplift), and isSignificant (boolean, derived from pValue threshold). The control variant should also include these fields for reference. Confidence intervals use the Wilson score interval for proportions (more accurate than normal approximation at small sample sizes). Include a note field on the experiment for analyst comments about anomalies. Avoid reporting pValue alone without confidence intervals — the interval tells you the practical magnitude of the effect, not just its existence. A statistically significant but tiny lift (e.g., 0.1%) may not be worth shipping.

Q: Should I do A/B test assignment server-side or client-side in a Next.js app?

Server-side assignment is strongly preferred in a Next.js App Router application for three reasons: it eliminates flicker, it works with SSR and streaming, and it avoids hydration mismatches. In the App Router, perform assignment in Next.js middleware (middleware.ts) where you have access to the request before the page renders. Read the userId from the session cookie, call your assignment service (or compute the hash locally), and set a cookie with the assigned variantId. The cookie is then available both server-side (in Server Components via cookies()) and client-side (for analytics events). To avoid slow assignment APIs blocking the middleware response, pre-compute assignments at the edge using a local consistent hash — this runs in under 1ms with no network hop. The assignment API response time budget is under 5ms at p99 (Redis O(1) lookup), because slow assignment directly delays the first byte of the HTML response. For hydration consistency, pass the variantId from the server as a prop or via a React context provider initialized in a Server Component — never read the cookie client-side to determine which variant to render, as this causes a hydration mismatch. Client-side assignment is appropriate only for experiments that apply purely to client-rendered interactions (e.g., a tooltip timing experiment) where SSR rendering is not involved. Even then, using a cookie set by middleware is cleaner than computing the assignment in useEffect.

Q: What is a multi-arm bandit and how does it differ from a standard A/B test in JSON configuration?

A standard A/B test uses fixed variant weights throughout the experiment (e.g., 50/50) and waits until the end to declare a winner — this is statistically rigorous but wastes traffic on losing variants during the experiment. A multi-arm bandit dynamically adjusts variant weights based on observed performance, routing more traffic toward better-performing variants as data accumulates. The most common algorithm is Thompson Sampling, which maintains a Beta distribution for each variant parameterized by its observed successes (alpha) and failures (beta). The variant selected for each user is the one that produces the highest sample from its Beta distribution — variants with more successes get higher samples on average and thus more traffic, but there is always some probability of sampling a lesser variant for continued exploration. In the JSON configuration, the key difference is the algorithm field ("fixed" vs "thompson_sampling") and the addition of a betaParams object per variant containing alpha and beta (both start at 1, increment by 1 on each success or failure respectively). The weights in a bandit experiment are dynamic — the assignment service computes them fresh from the current beta parameters rather than reading static weights. Bandits reduce regret by 20–40% compared to pure A/B for short experiments where one variant is clearly better early on, at the cost of lower statistical precision for variants that receive little traffic. They are best suited for experiments with fast, binary feedback (click/no-click) rather than slow metrics (revenue, 30-day retention).

Q: How do I avoid experiment collision when running multiple A/B tests simultaneously?

Experiment collision occurs when two experiments influence the same metric or UI surface simultaneously, making it impossible to attribute observed changes to either experiment individually. The primary defense is namespace isolation through consistent hashing with per-experiment salts: hash(userId + experimentId + salt) % 100 ensures that a user's bucket in experiment A is statistically independent from their bucket in experiment B, even if both experiments run at the same time. This means a user in the "control" group of experiment A is equally likely to be in any variant of experiment B — their assignments are uncorrelated. Beyond hashing, maintain an experiment registry in your JSON configuration system that tracks which surfaces and metrics each experiment affects. Before launching a new experiment, query the registry to find experiments with overlapping surfaces or metrics. For experiments that affect the same UI component, consider mutual exclusion: partition users into non-overlapping segments using a layer hash — hash(userId + "layer-checkout") % 100 assigns users to layers, and each layer runs only one experiment at a time. Include a mutuallyExclusiveGroup field in your experiment config JSON to declare which layer the experiment belongs to. The assignment service then checks if the user is already assigned to another experiment in the same layer before assigning them. Log all active experiment assignments in every analytics event so your data pipeline can filter for users who were exposed to only one experiment when computing clean metrics.

Written and reviewed by the Jsonic editorial team — every guide is verified against the official spec or runtime before publication.

Last updated: May 20, 2026

Most A/B testing articles focus on UI tooling and dashboards. This guide covers the JSON API layer: how to design the experiment configuration schema, the assignment API response format, consistent hashing for reproducible bucketing, the event tracking payload, statistical results in JSON, server-side vs. client-side assignment in Next.js, and multi-arm bandit configuration with Thompson Sampling. Every section includes a concrete JSON example and the reasoning behind each field choice.

JSON Experiment Configuration Schema

The experiment configuration is the source of truth for the assignment service, the analytics pipeline, and the feature flag system. It must encode enough information for any service to independently compute a deterministic assignment — including variant weights, targeting conditions, and the per-variant config object that the application reads to know what to render.

// Experiment configuration JSON schema
{
  "experimentId": "checkout-cta-v2",          // unique slug or UUID
  "name": "Checkout CTA Button Copy Test",
  "description": "Tests 'Buy Now' vs 'Complete Purchase' copy on checkout",
  "status": "running",                         // draft | running | paused | completed
  "startDate": "2026-02-15T00:00:00Z",        // ISO 8601
  "endDate": "2026-03-15T00:00:00Z",
  "salt": "a3f9b2c7",                         // random salt for hashing independence
  "minimumSampleSize": 2000,                  // per variant before trusting results

  "variants": [
    {
      "id": "control",
      "name": "Control — Buy Now",
      "weight": 50,                           // integer, all weights must sum to 100
      "config": {
        "buttonText": "Buy Now",
        "buttonColor": "#0070f3"
      }
    },
    {
      "id": "variant-a",
      "name": "Variant A — Complete Purchase",
      "weight": 50,
      "config": {
        "buttonText": "Complete Purchase",
        "buttonColor": "#0070f3"
      }
    }
  ],

  "metrics": [
    {
      "id": "checkout_conversion",
      "name": "Checkout Conversion Rate",
      "type": "binomial",                     // binomial | continuous | ratio
      "isPrimary": true
    },
    {
      "id": "revenue_per_user",
      "name": "Revenue Per User",
      "type": "continuous",
      "isPrimary": false
    }
  ],

  "targeting": {
    "percentTraffic": 100,                    // expose 100% of eligible users
    "conditions": [
      { "attribute": "userCountry", "operator": "in", "values": ["US", "CA", "GB"] },
      { "attribute": "deviceType",  "operator": "eq", "value": "desktop" },
      { "attribute": "accountAgeDays", "operator": "gte", "value": 7 }
    ]
  },

  "mutuallyExclusiveGroup": "checkout-layer", // prevents collision with other checkout experiments
  "createdBy": "alice@example.com",
  "createdAt": "2026-02-10T09:00:00Z"
}

The config object within each variant is intentionally open-ended — it holds whatever parameters differ between variants. The assignment service reads the experiment-level metadata (weights, targeting, salt) and returns the assigned variant's config to the application. The application never needs to know about other variants. The salt field is critical for experiment independence: without a unique salt per experiment, assignments across experiments are correlated. See JSON Schema validation for how to validate this config at write time.

Assignment API Response Format

The assignment API is the runtime interface between the experiment system and the application. A GET request to /experiment/assignreturns a deterministic variant assignment for a given user and experiment. The response must be fast (<5ms p99 via Redis O(1) lookup), cacheable, and include enough context for the application to render the correct experience and for the analytics pipeline to attribute subsequent events.

// GET /experiment/assign?experimentId=checkout-cta-v2&userId=usr_7f3a9c
// Response: 200 OK
{
  "experimentId": "checkout-cta-v2",
  "variantId": "variant-a",
  "userId": "usr_7f3a9c",
  "assignedAt": "2026-02-20T14:32:00.000Z",   // ISO 8601, millisecond precision
  "config": {
    "buttonText": "Complete Purchase",
    "buttonColor": "#0070f3"
  },
  "isControl": false,
  "debug": {                                    // omit in production; include in staging
    "bucket": 73,                              // computed hash bucket 0-99
    "variantRanges": {
      "control": [0, 49],
      "variant-a": [50, 99]
    }
  }
}

// Bulk assignment: POST /experiment/assign (batch for multiple experiments)
// Request body:
{
  "userId": "usr_7f3a9c",
  "experimentIds": ["checkout-cta-v2", "pricing-page-v3", "nav-redesign-v1"]
}
// Response:
{
  "userId": "usr_7f3a9c",
  "assignments": {
    "checkout-cta-v2":  { "variantId": "variant-a", "config": { "buttonText": "Complete Purchase" } },
    "pricing-page-v3":  { "variantId": "control",   "config": { "plan": "standard" } },
    "nav-redesign-v1":  { "variantId": "variant-b",  "config": { "layout": "horizontal" } }
  }
}

// Error response: user does not meet targeting conditions
// HTTP 200 (not 4xx — the request succeeded; the user is simply not in the experiment)
{
  "experimentId": "checkout-cta-v2",
  "variantId": null,
  "userId": "usr_7f3a9c",
  "assignedAt": "2026-02-20T14:32:00.000Z",
  "config": null,
  "excluded": true,
  "excludedReason": "targeting_mismatch"      // targeting_mismatch | experiment_not_running | holdout
}

// Caching strategy:
// - Cache assignment in Redis with key "assign:{experimentId}:{userId}" TTL 24h
// - Set a cookie "exp_checkout-cta-v2=variant-a; Max-Age=2592000; SameSite=Lax"
// - On subsequent requests, read cookie directly — no Redis call needed

Return HTTP 200 even when the user is excluded from the experiment — a 4xx response would cause clients to error-handle rather than render the default experience. The config field is the application's single source of truth for what to render; the variantId is for analytics. Cache assignments aggressively: compute the hash once, store in Redis, and persist in a cookie so the middleware can serve assignments without any downstream call.

Consistent User Assignment for JSON Experiments

Consistent hashing ensures a user always sees the same variant for a given experiment, without requiring a database lookup for every request. The algorithm maps hash(userId + experimentId + salt) % 100 to a bucket (0–99), then assigns the bucket to a variant based on cumulative weight ranges. The salt ensures assignments are statistically independent across experiments — a user in the "control" group for experiment A has an equal probability of being in any variant of experiment B.

// TypeScript: consistent hash assignment (runs at edge, <1ms)
import { createHash } from 'crypto'

interface Variant {
  id: string
  weight: number    // integer, weights must sum to 100
  config: Record<string, unknown>
}

interface Experiment {
  experimentId: string
  salt: string
  variants: Variant[]
}

function assignVariant(
  experiment: Experiment,
  userId: string
): { variantId: string; bucket: number; config: Record<string, unknown> } {
  // 1. Hash: combine userId + experimentId + salt for independence
  const input = userId + experiment.experimentId + experiment.salt
  const hash  = createHash('sha256').update(input).digest('hex')

  // 2. Map hash to bucket 0–99
  // Use first 8 hex chars (32 bits) for sufficient entropy
  const bucket = parseInt(hash.slice(0, 8), 16) % 100

  // 3. Map bucket to variant by cumulative weight
  let cumulative = 0
  for (const variant of experiment.variants) {
    cumulative += variant.weight
    if (bucket < cumulative) {
      return { variantId: variant.id, bucket, config: variant.config }
    }
  }

  // Fallback (should never reach here if weights sum to 100)
  const last = experiment.variants[experiment.variants.length - 1]
  return { variantId: last.id, bucket, config: last.config }
}

// Example usage:
const experiment: Experiment = {
  experimentId: 'checkout-cta-v2',
  salt: 'a3f9b2c7',
  variants: [
    { id: 'control',   weight: 50, config: { buttonText: 'Buy Now' } },
    { id: 'variant-a', weight: 50, config: { buttonText: 'Complete Purchase' } },
  ],
}

// Same userId always produces the same bucket:
assignVariant(experiment, 'usr_7f3a9c')
// => { variantId: 'variant-a', bucket: 73, config: { buttonText: 'Complete Purchase' } }

assignVariant(experiment, 'usr_7f3a9c')
// => { variantId: 'variant-a', bucket: 73, config: { buttonText: 'Complete Purchase' } }

// Different userId produces independent bucket (no correlation with other experiments):
assignVariant(experiment, 'usr_2b8d1e')
// => { variantId: 'control', bucket: 21, config: { buttonText: 'Buy Now' } }

// Verify independence: same user in different experiment (different salt)
const otherExperiment = { ...experiment, experimentId: 'pricing-page-v3', salt: 'd7e4f1a2' }
assignVariant(otherExperiment, 'usr_7f3a9c')
// => Different bucket — salt ensures no correlation

// Sample size calculation for 95% confidence, 80% power:
// n = 2 × (1.96 + 0.84)² × p(1-p) / δ²
// For p=0.05 (5% baseline), δ=0.01 (detect 1pp lift):
// n = 2 × 7.84 × 0.0475 / 0.0001 = 7,448 users per variant
function sampleSizePerVariant(
  baselineRate: number,    // e.g., 0.05 for 5%
  mde: number,             // minimum detectable effect, e.g., 0.01 for 1pp
  alpha = 0.05,            // significance level (two-tailed)
  power = 0.80             // 1 - beta
): number {
  const zAlpha = 1.96  // z-score for alpha=0.05 two-tailed
  const zBeta  = 0.84  // z-score for power=0.80
  const p = baselineRate
  return Math.ceil(2 * Math.pow(zAlpha + zBeta, 2) * p * (1 - p) / Math.pow(mde, 2))
}

The salt is not a secret — it is a random string stored in the experiment config to prevent hash correlation between experiments. Generate a fresh 8-character random hex string for each experiment at creation time. Without the salt, a user with userId = "usr_7f3a9c" would produce the same hash prefix for every experiment with the same ID prefix, causing non-random assignment distribution. MurmurHash3 or xxHash are faster alternatives to SHA-256 for high-throughput edge environments.

Event Tracking JSON Schema for A/B Tests

Event tracking is the data pipeline that connects user behavior back to experiment variants. Every event must carry its experiment context — experimentId and variantId — so your analytics system can compute per-variant metrics. Two event types are special: "exposure" (tracked when the user first sees the variant) and "conversion" (tracked when the primary metric is triggered).

// Single event payload (self-contained — no foreign key lookups needed)
{
  "type": "conversion",                       // exposure | conversion | click | page_view | custom
  "experimentId": "checkout-cta-v2",
  "variantId": "variant-a",
  "userId": "usr_7f3a9c",
  "sessionId": "sess_4a8b2d",               // groups events within a session
  "clientTimestamp": "2026-02-20T14:32:00.123Z",  // client-reported (may drift)
  "serverTimestamp": "2026-02-20T14:32:00.145Z",  // server-recorded on receipt
  "properties": {                            // open-ended event-specific data
    "revenue": 149.99,
    "currency": "USD",
    "productId": "prod_9c3f2a",
    "checkoutStep": "confirmation"
  }
}

// Exposure event — MUST be tracked when user first sees the variant
// (without exposure events, conversion rate denominator is wrong)
{
  "type": "exposure",
  "experimentId": "checkout-cta-v2",
  "variantId": "variant-a",
  "userId": "usr_7f3a9c",
  "sessionId": "sess_4a8b2d",
  "clientTimestamp": "2026-02-20T14:31:55.000Z",
  "serverTimestamp": "2026-02-20T14:31:55.022Z",
  "properties": {
    "page": "/checkout",
    "renderMethod": "ssr"                   // ssr | csr — for debugging hydration issues
  }
}

// Batched event payload (collect 5s or 100 events, then POST array)
// POST /events/batch
{
  "batchId": "bat_f2a9c3",               // UUID for deduplication
  "sentAt": "2026-02-20T14:32:05.000Z",  // when the batch was sent
  "events": [
    {
      "type": "exposure",
      "experimentId": "checkout-cta-v2",
      "variantId": "variant-a",
      "userId": "usr_7f3a9c",
      "sessionId": "sess_4a8b2d",
      "clientTimestamp": "2026-02-20T14:31:55.000Z",
      "properties": { "page": "/checkout" }
    },
    {
      "type": "click",
      "experimentId": "checkout-cta-v2",
      "variantId": "variant-a",
      "userId": "usr_7f3a9c",
      "sessionId": "sess_4a8b2d",
      "clientTimestamp": "2026-02-20T14:32:00.000Z",
      "properties": { "element": "cta-button", "page": "/checkout" }
    },
    {
      "type": "conversion",
      "experimentId": "checkout-cta-v2",
      "variantId": "variant-a",
      "userId": "usr_7f3a9c",
      "sessionId": "sess_4a8b2d",
      "clientTimestamp": "2026-02-20T14:32:00.123Z",
      "properties": { "revenue": 149.99, "currency": "USD" }
    }
  ]
}

// Client-side batching logic (TypeScript):
class EventBatcher {
  private buffer: Event[] = []
  private timer: ReturnType<typeof setTimeout> | null = null
  private readonly maxSize = 100
  private readonly maxDelayMs = 5000

  track(event: Event): void {
    this.buffer.push(event)
    if (this.buffer.length >= this.maxSize) {
      this.flush()
    } else if (!this.timer) {
      this.timer = setTimeout(() => this.flush(), this.maxDelayMs)
    }
  }

  private async flush(): Promise<void> {
    if (this.buffer.length === 0) return
    const batch = this.buffer.splice(0)        // drain buffer atomically
    if (this.timer) { clearTimeout(this.timer); this.timer = null }
    await fetch('/events/batch', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ batchId: crypto.randomUUID(), sentAt: new Date().toISOString(), events: batch }),
      keepalive: true,                         // survives page unload
    })
  }
}

The keepalive: true flag on the fetch call is essential — without it, events tracked just before page navigation (e.g., a click that triggers a redirect) are dropped when the browser terminates the request. The batchId enables server-side deduplication: if the client retries a failed POST, the server can discard the duplicate batch. Track exposure events as early as possible — before any API call or async rendering — to maximize denominator accuracy.

Statistical Results in JSON Format

The results API returns the computed statistics for each variant so dashboards and alerting systems can consume experiment outcomes programmatically. The JSON structure separates experiment metadata, per-variant metrics, and the statistical comparison between control and treatment.

// GET /experiment/results?experimentId=checkout-cta-v2
{
  "experimentId": "checkout-cta-v2",
  "name": "Checkout CTA Button Copy Test",
  "status": "running",
  "startDate": "2026-02-15T00:00:00Z",
  "primaryMetric": "checkout_conversion",
  "computedAt": "2026-02-20T00:00:00Z",     // when statistics were last computed
  "totalUsers": 8432,                        // unique users exposed across all variants
  "winner": null,                            // null while running; variantId when concluded

  "variants": [
    {
      "variantId": "control",
      "isControl": true,
      "sampleSize": 4218,                    // unique users exposed to this variant
      "conversions": 211,                    // users who completed the primary metric
      "conversionRate": 0.0500,              // conversions / sampleSize
      "statistics": {
        "confidenceInterval": [0.0435, 0.0570],  // 95% Wilson score interval [lower, upper]
        "standardError": 0.00335,
        "lift": null,                             // undefined for control
        "pValue": null,                           // undefined for control
        "isSignificant": null
      }
    },
    {
      "variantId": "variant-a",
      "isControl": false,
      "sampleSize": 4214,
      "conversions": 253,
      "conversionRate": 0.0600,
      "statistics": {
        "confidenceInterval": [0.0530, 0.0676],  // 95% Wilson score interval
        "standardError": 0.00366,
        "lift": 0.200,                            // (0.06 - 0.05) / 0.05 = 20% relative lift
        "liftAbsolute": 0.010,                    // 0.06 - 0.05 = 1pp absolute lift
        "pValue": 0.018,                          // two-tailed z-test
        "zScore": 2.37,
        "isSignificant": true                     // pValue < 0.05
      }
    }
  ],

  "sampleSizeRequired": 7448,               // per variant for 80% power at 1pp MDE
  "sampleSizeReached": true,               // all variants have >= minimumSampleSize

  "notes": "Variant A shows 20% relative lift in conversion rate with p=0.018. Recommend shipping after 7 more days to validate stability."
}

// Computing Wilson score confidence interval (more accurate than normal approx at low n):
function wilsonInterval(
  successes: number,
  n: number,
  z = 1.96       // 1.96 for 95% confidence
): [number, number] {
  const p     = successes / n
  const denom = 1 + z * z / n
  const center = (p + z * z / (2 * n)) / denom
  const margin = (z * Math.sqrt(p * (1 - p) / n + z * z / (4 * n * n))) / denom
  return [center - margin, center + margin]
}

// Two-proportion z-test p-value:
function twoProportionPValue(
  n1: number, c1: number,   // control: n, conversions
  n2: number, c2: number    // treatment: n, conversions
): number {
  const p1     = c1 / n1
  const p2     = c2 / n2
  const pPool  = (c1 + c2) / (n1 + n2)
  const se     = Math.sqrt(pPool * (1 - pPool) * (1 / n1 + 1 / n2))
  const z      = (p2 - p1) / se
  // Two-tailed p-value (approximation via standard normal CDF):
  return 2 * (1 - standardNormalCdf(Math.abs(z)))
}

Use Wilson score confidence intervals rather than normal approximation — the normal approximation produces intervals outside [0, 1] at low sample sizes or extreme conversion rates. Report both absolute lift (liftAbsolute) and relative lift (lift) — a 20% relative lift sounds impressive but is only 1 percentage point if the baseline is 5%. A statistically significant result with a tiny absolute effect may not be worth the engineering cost to ship. See JSON API design for response envelope patterns.

Server-Side vs Client-Side JSON Assignment in Next.js

In a Next.js App Router application, assignment should happen server-side in middleware before the page renders. This eliminates variant flicker (the flash of the control variant before JavaScript loads), works with SSR and streaming, and avoids React hydration mismatches from rendering different content on the server vs. client.

// middleware.ts — assign variants at the edge before page renders
import { NextRequest, NextResponse } from 'next/server'
import { assignVariant } from '@/lib/experiments/assign'
import { getExperiments }  from '@/lib/experiments/registry'

export async function middleware(request: NextRequest) {
  const response = NextResponse.next()

  // 1. Get userId from session cookie (or anonymous ID)
  const userId = request.cookies.get('userId')?.value ?? generateAnonymousId()

  // 2. Load active experiments for this route
  const experiments = getExperiments(request.nextUrl.pathname)

  for (const experiment of experiments) {
    const cookieName = `exp_${experiment.experimentId}`

    // 3. Check if assignment is already cached in a cookie
    if (!request.cookies.has(cookieName)) {
      // 4. Compute deterministic assignment (no network call — pure hash)
      const assignment = assignVariant(experiment, userId)

      // 5. Set cookie so subsequent requests skip computation
      response.cookies.set(cookieName, JSON.stringify({
        variantId: assignment.variantId,
        config:    assignment.config,
      }), {
        maxAge:   60 * 60 * 24 * 30,   // 30 days
        sameSite: 'lax',
        httpOnly: false,               // client needs to read it for analytics events
      })
    }
  }

  return response
}

export const config = {
  matcher: ['/checkout/:path*', '/pricing/:path*'],  // only run on relevant routes
}

// app/checkout/page.tsx — Server Component reads assignment from cookie
import { cookies } from 'next/headers'

export default async function CheckoutPage() {
  const cookieStore = cookies()
  const raw = cookieStore.get('exp_checkout-cta-v2')?.value

  // Parse assignment from cookie — set by middleware
  const assignment = raw ? JSON.parse(raw) : { variantId: 'control', config: { buttonText: 'Buy Now' } }

  return (
    <CheckoutForm
      variantId={assignment.variantId}
      buttonText={assignment.config.buttonText}
    />
  )
}

// app/checkout/checkout-form.tsx — Client Component tracks exposure
'use client'
import { useEffect } from 'react'
import { trackEvent } from '@/lib/analytics'

interface Props {
  variantId: string
  buttonText: string
}

export function CheckoutForm({ variantId, buttonText }: Props) {
  useEffect(() => {
    // Track exposure once on mount — variantId comes from server, no flicker
    trackEvent({
      type: 'exposure',
      experimentId: 'checkout-cta-v2',
      variantId,
      properties: { page: '/checkout', renderMethod: 'ssr' },
    })
  }, [variantId])

  return (
    <form>
      {/* form fields */}
      <button type="submit">{buttonText}</button>
    </form>
  )
}

// ANTI-PATTERN: reading assignment in useEffect causes flicker and hydration mismatch
// ❌ DO NOT DO THIS:
// useEffect(() => {
//   const cookie = document.cookie.match(/exp_checkout-cta-v2=([^;]+)/)?.[1]
//   const { variantId } = JSON.parse(decodeURIComponent(cookie ?? '{}'))
//   setButtonText(variantId === 'variant-a' ? 'Complete Purchase' : 'Buy Now')
// }, [])
// This renders 'Buy Now' on server, then switches to 'Complete Purchase' after hydration — visible flicker

The middleware runs at the edge (Vercel Edge Runtime) before any server component code executes, so the assignment is available immediately when the Server Component reads cookies(). The assignment cookie is not httpOnly because the client-side analytics SDK needs to read it to include variantId in event payloads without an additional API call. Never derive the rendered variant from a useEffect — this guarantees flicker and hydration errors.

Multi-Arm Bandit JSON Configuration

A standard A/B test uses fixed weights (e.g., 50/50) for the entire experiment duration, wasting traffic on poor-performing variants. A multi-arm bandit dynamically adjusts weights based on observed performance. Thompson Sampling, the most common algorithm, maintains a Beta distribution per variant and assigns each user to the variant with the highest sampled value — naturally routing more traffic to better performers while continuing to explore others.

// Multi-arm bandit experiment configuration
{
  "experimentId": "homepage-hero-bandit",
  "name": "Homepage Hero Image Bandit",
  "algorithm": "thompson_sampling",           // "fixed" for standard A/B, "thompson_sampling" for bandit
  "status": "running",
  "startDate": "2026-02-15T00:00:00Z",
  "updatedAt": "2026-02-20T06:00:00Z",       // last time weights were recomputed

  "variants": [
    {
      "id": "hero-product",
      "name": "Product Hero Image",
      "weight": 23,                           // DYNAMIC — recomputed every hour from beta params
      "config": { "heroImage": "product.jpg", "headline": "Build faster" },
      "betaParams": {
        "alpha": 48,                          // 1 + number of successes (conversions)
        "beta":  156                          // 1 + number of failures (non-conversions)
        // Posterior mean = alpha / (alpha + beta) = 48/204 = 23.5% conversion rate estimate
      }
    },
    {
      "id": "hero-social-proof",
      "name": "Social Proof Hero",
      "weight": 54,                           // higher weight — currently best performer
      "config": { "heroImage": "team.jpg", "headline": "Trusted by 10,000 teams" },
      "betaParams": {
        "alpha": 112,
        "beta":  187
        // Posterior mean = 112/299 = 37.5% — best performer, gets most traffic
      }
    },
    {
      "id": "hero-demo",
      "name": "Demo CTA Hero",
      "weight": 23,
      "config": { "heroImage": "demo.jpg", "headline": "See it in action" },
      "betaParams": {
        "alpha": 29,
        "beta":  94
        // Posterior mean = 29/123 = 23.6%
      }
    }
  ],

  "updateFrequency": "1h",                  // how often to recompute weights from beta params
  "minimumExplorationRate": 0.05,           // always allocate at least 5% to each variant
  "metrics": [{ "id": "signup_click", "type": "binomial", "isPrimary": true }]
}

// Thompson Sampling: weight computation from beta params
// Run this every hour; update experiment config with new weights
function computeThompsonWeights(
  variants: Array<{ id: string; betaParams: { alpha: number; beta: number } }>,
  simulations = 10_000,
  minExploration = 0.05
): Record<string, number> {
  // Count how many simulations each variant wins
  const wins: Record<string, number> = Object.fromEntries(variants.map(v => [v.id, 0]))

  for (let i = 0; i < simulations; i++) {
    // Sample from each variant's Beta(alpha, beta) distribution
    const samples = variants.map(v => ({
      id:     v.id,
      sample: sampleBeta(v.betaParams.alpha, v.betaParams.beta),
    }))
    // Variant with highest sample wins this simulation
    const winner = samples.reduce((best, cur) => cur.sample > best.sample ? cur : best)
    wins[winner.id]++
  }

  // Convert win counts to percentages, apply minimum exploration floor
  const rawWeights: Record<string, number> = {}
  for (const [id, count] of Object.entries(wins)) {
    rawWeights[id] = count / simulations
  }

  // Apply minimum exploration: ensure each variant gets at least minExploration traffic
  const n = variants.length
  const floor = minExploration
  const adjusted: Record<string, number> = {}
  const totalFloor = floor * n

  for (const [id, w] of Object.entries(rawWeights)) {
    adjusted[id] = Math.max(floor, w * (1 - totalFloor) + floor)
  }

  // Normalize to sum to 1, convert to integer percentages summing to 100
  return normalizeToIntegerWeights(adjusted)
}

// After each conversion or non-conversion, update beta params:
function recordOutcome(variant: { betaParams: { alpha: number; beta: number } }, converted: boolean) {
  if (converted) {
    variant.betaParams.alpha += 1   // success: increment alpha
  } else {
    variant.betaParams.beta  += 1   // failure: increment beta
  }
  // Persist updated params back to experiment config store
}

// Bandit reduces regret by 20-40% vs fixed A/B for short experiments:
// - At 1,000 total users: bandit routes ~70% traffic to winning variant automatically
// - A/B test at 1,000 users: still 50/50, needs more data before switching
// Trade-off: bandit provides less statistical precision for losing variants
// Use A/B for: slow metrics (retention, LTV), regulatory compliance, learning
// Use bandit for: fast binary metrics (click, signup), revenue optimization, short experiments

The betaParams in the JSON config are the persistent state of the bandit — they accumulate across all users and are updated after every conversion event. The weight field is recomputed hourly from the beta parameters via Thompson Sampling simulation. The minimumExplorationRate prevents the bandit from fully converging on one variant too early, ensuring continued data collection on other variants in case conditions change. For long-running experiments measuring retention or LTV, use a standard A/B test — bandits optimize for fast signals and can prematurely converge on a variant that looks good short-term.

Key Terms

A/B Test: A controlled experiment that randomly assigns users to one of two or more variants of a feature and measures the effect on one or more metrics. The control group sees the existing experience; treatment groups see the experimental change. Users are assigned via a deterministic hash of their user ID so the assignment is stable across sessions. The experiment runs until the required sample size is reached, then statistical tests determine whether observed differences are due to the change or random chance. A/B tests require careful design: defining a primary metric before the experiment starts, computing the required sample size to detect the minimum effect of interest, and avoiding peeking at results early (which inflates false positive rates). In JSON API systems, A/B tests manifest as configuration schemas, assignment API responses, and event tracking payloads.
Variant: One of the possible experiences in an A/B test. Each variant has an ID, a weight (integer representing the percentage of eligible users assigned to it), and a config object containing the parameters that differ from other variants. The control variant is the existing, unchanged experience — it is the baseline against which treatment variants are compared. Variants should differ in exactly one dimension at a time to enable causal attribution; testing multiple changes simultaneously is a multivariate test (MVT) with a different statistical model. In the JSON configuration schema, each variant's config object is the application's source of truth for what to render — the application reads the config and renders accordingly without needing to know which experiment it belongs to.
Assignment API: A service endpoint (typically GET /experiment/assign) that accepts a userId and experimentId and returns a deterministic variant assignment. The assignment API must be fast — response time budget is under 5ms at p99, achieved via Redis O(1) lookups for cached assignments. The response includes the variantId, the variant config object (so the application knows what to render), and metadata like assignedAt for analytics. Assignments are cached in Redis and persisted in cookies so the assignment is not recomputed on every request. The assignment API also handles exclusion cases — returning a null variantId with an excludedReason when a user does not meet targeting conditions, rather than returning an HTTP 4xx error.
Consistent Hashing: A technique for assigning users to experiment variants deterministically without a database lookup. The assignment is computed as bucket = hash(userId + experimentId + salt) % 100, where the salt is a per-experiment random string that ensures assignments are independent across experiments. Because the hash function is deterministic, the same userId always produces the same bucket for a given experiment — the user always sees the same variant, even after clearing cookies or switching devices (as long as the userId persists). The salt prevents correlation: without it, users consistently placed in the control group of experiment A would tend to land in the control group of all other experiments, biasing multi-experiment analyses.
Statistical Significance: A measure of whether an observed difference between variants is likely to be a real effect rather than random sampling variation. Conventionally, a result is statistically significant when the p-value is below 0.05 — meaning there is less than a 5% probability of observing a difference this large by chance if the null hypothesis (no effect) is true. Statistical significance does not measure practical importance: a 0.1% absolute lift can be statistically significant with a large enough sample. In the A/B test results JSON, significance is represented by pValue, zScore, and isSignificant (boolean). Reaching statistical significance requires a pre-determined minimum sample size — peeking at results before that threshold inflates false positive rates (the "peeking problem").
Confidence Interval: A range of plausible values for the true conversion rate (or other metric) of a variant, given the observed data. A 95% confidence interval means that if the experiment were repeated many times, 95% of the computed intervals would contain the true population value. In the results JSON, the confidenceInterval field is a [lower, upper] array for the variant's conversion rate. Use Wilson score intervals rather than normal approximation — the normal approximation produces intervals outside [0, 1] at small sample sizes or extreme rates. The overlap between confidence intervals of two variants is a rough visual indicator of statistical significance, but the proper test is the two-proportion z-test that produces the pValue field.
Multi-Arm Bandit: An online learning algorithm for A/B testing that dynamically adjusts variant weights based on observed performance, routing more traffic to better-performing variants while continuing to explore others. Unlike a standard A/B test with fixed 50/50 weights, a bandit adapts in real time. The name comes from the "multi-armed bandit" problem: a gambler choosing between slot machines (arms) with unknown payout rates, seeking to maximize total reward. Bandits reduce regret — the cumulative reward loss from pulling suboptimal arms — compared to pure A/B tests, especially for short experiments or those with a clear early winner. In the JSON configuration, bandits store per-variant beta distribution parameters (alpha and beta) that accumulate with each success and failure, and recompute variant weights periodically via Thompson Sampling simulation.
Thompson Sampling: A Bayesian algorithm for multi-arm bandit problems. For each variant, it maintains a Beta(alpha, beta) distribution where alpha = 1 + successes and beta = 1 + failures. To assign a user, the algorithm samples one value from each variant's Beta distribution and routes the user to the variant with the highest sample. Variants with more successes have higher-mean distributions and thus tend to get higher samples — but there is always some probability of sampling a lesser variant, providing exploration. As data accumulates, the distributions narrow and the algorithm converges on the best variant. The JSON representation stores alpha and beta per variant; a background job recomputes variant weights hourly by running 10,000 simulations and counting which variant wins each simulation.
Minimum Detectable Effect: The smallest treatment effect (lift) that an experiment is designed to detect with a specified statistical power (typically 80%) and significance level (typically 5%). The MDE determines the required sample size: smaller effects require exponentially larger samples. The formula is n = 2 × (zα + zβ)² × p(1−p) / δ² per variant, where p is the baseline conversion rate and δ is the MDE in absolute percentage points. For example, detecting a 1 percentage point lift on a 5% baseline at 95% confidence requires approximately 7,448 users per variant. Choosing the MDE before starting the experiment is critical — an experiment cannot retrospectively claim to detect an effect smaller than the pre-registered MDE without inflating the false positive rate.

FAQ

What should a JSON experiment configuration schema include?

A JSON experiment configuration schema should include: experimentId (UUID or slug), status ("draft" | "running" | "paused" | "completed"), startDate/endDate in ISO 8601, a variants array where each variant has id, weight (integer, all weights sum to 100), and a config object with feature-specific parameters. Also required: a metrics array declaring primary and secondary metrics, a targeting object with audience conditions and percentTraffic, a salt string for hashing independence across concurrent experiments, and minimumSampleSize per variant. Optional but recommended: mutuallyExclusiveGroup to prevent collision with experiments on the same surface. The variant config object is intentionally open-ended — it holds whatever parameters the experiment tests (button text, price, algorithm flag) so the application can read it without knowing experiment internals.

How does consistent hashing ensure users always see the same A/B test variant?

Consistent hashing computes bucket = hash(userId + experimentId + salt) % 100 and maps the bucket to a variant by cumulative weight range. Because the hash is deterministic, the same userId always produces the same bucket for a given experiment — the user sees the same variant across devices, after cookie clearing, and across sessions, as long as the userId persists. The salt is a per-experiment random string that decorrelates assignments: without it, a user's bucket would be correlated across experiments with similar IDs. The hash function (SHA-256, MurmurHash3) must have good avalanche properties — small input changes should produce completely different outputs. Assignments are then cached in Redis and a cookie so the hash is computed only once per user per experiment. The cookie persistence layer ensures users are not reassigned if the Redis cache expires.

What JSON fields should an A/B test event tracking payload include?

Required fields: type (event name: "exposure", "conversion", "click"), experimentId, variantId, userId, sessionId, and clientTimestamp in ISO 8601 with millisecond precision. The server should add serverTimestamp on receipt. Include a properties object for event-specific data (revenue, productId, page URL). The "exposure" event type is especially critical — it must be tracked the moment the user sees the variant, not just on conversion, because it provides the denominator for the conversion rate. Batch events client-side for 5 seconds or 100 events before POSTing the array, with keepalive: true to survive page unload. Include a batchId for server-side deduplication on retries. Each event in the batch must be self-contained so the server can process them individually even if they arrive out of order.

How do I structure A/B test results in JSON for statistical significance?

Structure results at two levels. Experiment level: experimentId, status, primaryMetric, totalUsers, winner (null while running), and sampleSizeRequired. Per-variant level in a variants array: variantId, isControl, sampleSize, conversions, conversionRate, and a statistics object with confidenceInterval ([lower, upper] Wilson score interval), pValue (two-proportion z-test), zScore, lift (relative, e.g., 0.20 for 20%), liftAbsolute (absolute pp difference), and isSignificant (boolean). Use Wilson score intervals, not normal approximation — normal approximation breaks at small samples. Report both relative and absolute lift — a 20% relative lift that is only 0.1pp absolute may not be worth shipping. Never omit confidence intervals; they show the magnitude of uncertainty, not just whether the result crossed the p=0.05 threshold.

Should I do A/B test assignment server-side or client-side in a Next.js app?

Server-side assignment in Next.js middleware is strongly preferred. Perform the consistent hash in middleware.ts before the page renders, set the assignment in a cookie, and read it in Server Components via cookies(). This eliminates flicker (no flash of the control variant before JavaScript loads), avoids React hydration mismatches (the server and client both render the same variant from the start), and works with SSR streaming. Run the hash computation at the edge (Vercel Edge Runtime) with no network hop — under 1ms. Set the cookie with httpOnly: false so the client-side analytics SDK can read the variantId for event tracking without an extra API call. Never read the assignment cookie in a useEffect to determine which variant to render — this causes a hydration mismatch and visible flicker. Client-side assignment is appropriate only for experiments on purely client-rendered interactions where SSR is not involved.

What is a multi-arm bandit and how does it differ from a standard A/B test in JSON configuration?

A standard A/B test uses fixed variant weights (e.g., 50/50) throughout the experiment and waits until the predetermined sample size is reached to declare a winner. A multi-arm bandit dynamically adjusts weights based on observed performance, routing more traffic to better-performing variants in real time. The JSON configuration difference: an A/B test has static weight values; a bandit experiment adds an algorithm field ("thompson_sampling"), per-variant betaParams with alpha (1 + successes) and beta (1 + failures), and an updateFrequency. The weights in a bandit are recomputed hourly by running Thompson Sampling simulations against the current beta parameters. Bandits reduce regret by 20–40% vs. pure A/B for short experiments with fast binary metrics and clear early winners. Use A/B for slow metrics (retention, LTV) or when regulatory compliance requires a clean statistical test; use bandits for click-through rate, signup conversion, and experiments where routing traffic to a losing variant for weeks has significant business cost.

How do I avoid experiment collision when running multiple A/B tests simultaneously?

Experiment collision — where two experiments influence the same metric simultaneously — is prevented by three mechanisms. First, per-experiment salts in the consistent hash ensure statistical independence: a user's assignment in experiment A is uncorrelated with their assignment in experiment B. Second, maintain an experiment registry JSON that tracks which UI surfaces and metrics each experiment affects; query it before launching to find overlapping experiments. Third, for experiments on the same surface, use mutual exclusion layers: add a mutuallyExclusiveGroup field to the experiment config, and have the assignment service use a layer hash (hash(userId + "layer-checkout") % 100) to allocate users to layers, running only one experiment per layer at a time. Log all active experiment assignments in every analytics event so your data pipeline can filter to users exposed to only one experiment when computing clean metrics for either experiment.