JSON File Upload: Multipart vs Base64, Presigned URLs & Metadata Design

Last updated:

File upload APIs have three primary strategies, each with distinct tradeoffs: multipart/form-data sends a file and JSON metadata as separate parts in one HTTP request with minimal encoding overhead; base64 JSON encoding embeds file data inside a JSON string at the cost of 33% size inflation; and presigned URLs bypass your server entirely, letting clients upload directly to S3, GCS, or R2. Most guides cover only multipart/form-data and ignore the decision framework for choosing between approaches based on file size, API design consistency, and infrastructure. This guide covers all three strategies with working code, the size thresholds that should drive your decision, presigned URL security patterns, Next.js App Router–specific constraints (4MB default body limit, Server Actions, Vercel Blob), upload progress tracking with XMLHttpRequest, resumable uploads via the tus protocol, and a production-ready file metadata JSON schema including checksum, virus scanning hooks, and storage key design. By the end you will have a complete mental model for any file upload scenario in a JSON API.

File Upload Strategy Selection

Choosing the right upload strategy before writing any code saves significant refactoring later. The three options — multipart/form-data, base64 JSON, and presigned URLs — each occupy a different point in the size-vs-complexity tradeoff space. File size is the primary decision variable: use base64 JSON under 1MB, multipart between 1MB and 10MB, and presigned URLs for anything larger.

// Strategy decision matrix
//
// File size     │ Strategy            │ Why
// ─────────────────────────────────────────────────────────────
// < 1 MB        │ Base64 JSON         │ Simple pure-JSON API,
//               │                     │ 33% overhead acceptable
// 1 MB – 10 MB  │ multipart/form-data │ Native binary transfer,
//               │                     │ no encoding overhead
// > 10 MB       │ Presigned URL       │ Server never handles
//               │                     │ binary data at all
// Any size      │ Presigned URL       │ When bandwidth cost or
//               │                     │ server memory matters

// ── Rough size overhead for base64 ───────────────────────────
// original_bytes × (4/3) = base64_bytes  (exactly +33.33%)
// 100 KB file  → ~133 KB JSON field
// 500 KB file  → ~667 KB JSON field
// 1 MB file    → ~1.33 MB JSON field  (at the edge of acceptable)
// 5 MB file    → ~6.67 MB JSON field  (too large — body parser OOM risk)

// ── When to use each strategy ─────────────────────────────────

// USE BASE64 JSON when:
// - Files are reliably < 1MB (avatars, icons, small PDFs)
// - API is fully JSON and adding multipart breaks SDK consistency
// - File is part of a transactional JSON document (e.g., contract
//   with embedded signature image that must be atomic)
// - Mobile clients cannot send multipart easily

// USE MULTIPART/FORM-DATA when:
// - Files between 1MB and 10MB
// - Standard server-side parsers (multer, busboy, formidable)
// - Need to send multiple files in one request
// - Streaming processing on the server (pipe directly to disk or S3)

// USE PRESIGNED URLS when:
// - Files > 10MB (images, videos, documents, archives)
// - Want to eliminate server bandwidth cost entirely
// - Need client-side upload progress at full resolution
// - Using serverless (Lambda, Vercel) where memory is expensive
// - Need resumable uploads (combine with tus or S3 multipart)

// ── Multipart boundary parsing overhead (benchmark) ───────────
// File size    │ Parse time (multer, Node.js 20, M2 MacBook)
// ─────────────────────────────────────────────────────────────
// 100 KB       │ ~3ms
// 1 MB         │ ~4ms
// 10 MB        │ ~5ms
// 100 MB       │ ~6ms   (parsing is streaming — nearly constant)
// Conclusion: boundary parsing overhead is negligible at any size

The 1MB threshold for base64 is a practical guideline, not a hard limit — it reflects the point at which the 33% encoding overhead starts causing body parser memory pressure in typical Node.js deployments. If your server has generous memory limits and your JSON body parser is configured with a high sizeLimit, base64 can work at 2-3MB. But presigned URLs are architecturally cleaner at that scale. For JSON API design patterns beyond file upload, see the linked guide.

Multipart Form Data with JSON Metadata

The multipart/form-data strategy sends both a JSON metadata part and a binary file part in a single HTTP request, separated by a MIME boundary string. The browser or HTTP client handles boundary generation automatically when you use FormData. On the server, multer (Node.js) parses both parts efficiently with streaming — the file bytes are never fully buffered in memory when you pipe them to disk or an S3 stream.

// ── Client: send file + JSON metadata in one request ─────────────────
async function uploadWithMetadata(file: File, metadata: object) {
  const formData = new FormData()

  // Part 1: JSON metadata as a Blob with correct Content-Type
  const metadataBlob = new Blob([JSON.stringify(metadata)], {
    type: 'application/json',
  })
  formData.append('metadata', metadataBlob)

  // Part 2: the file
  formData.append('file', file, file.name)

  // Do NOT set Content-Type manually — browser sets it with boundary
  const response = await fetch('/api/upload', {
    method: 'POST',
    body: formData,
  })
  return response.json()
}

// Usage
await uploadWithMetadata(selectedFile, {
  albumId: 42,
  caption: 'Product photo — front view',
  tags: ['product', 'hero'],
})

// ── Wire format (what the server receives) ────────────────────────────
// Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryXYZ
//
// ------WebKitFormBoundaryXYZ
// Content-Disposition: form-data; name="metadata"
// Content-Type: application/json
//
// {"albumId":42,"caption":"Product photo — front view","tags":["product","hero"]}
// ------WebKitFormBoundaryXYZ
// Content-Disposition: form-data; name="file"; filename="photo.jpg"
// Content-Type: image/jpeg
//
// [binary JPEG bytes]
// ------WebKitFormBoundaryXYZ--

// ── Server: parse with multer (Express/Node.js) ───────────────────────
import multer from 'multer'
import express from 'express'

const app = express()

// Store in memory for small files (<50MB); use diskStorage for larger
const upload = multer({
  storage: multer.memoryStorage(),
  limits: { fileSize: 10 * 1024 * 1024 }, // 10MB limit
  fileFilter: (req, file, cb) => {
    const allowed = ['image/jpeg', 'image/png', 'image/webp', 'application/pdf']
    cb(null, allowed.includes(file.mimetype))
  },
})

app.post(
  '/api/upload',
  upload.fields([
    { name: 'metadata', maxCount: 1 },
    { name: 'file',     maxCount: 1 },
  ]),
  (req, res) => {
    const files = req.files as Record<string, Express.Multer.File[]>

    // Parse the JSON metadata part
    const metadata = JSON.parse(files.metadata[0].buffer.toString('utf-8'))

    // Access the file
    const file = files.file[0]
    console.log('filename:', file.originalname)
    console.log('mimeType:', file.mimetype)
    console.log('size:', file.size, 'bytes')
    console.log('metadata:', metadata)

    // Process: save to disk, upload to S3, etc.
    // await s3.putObject({ Bucket, Key, Body: file.buffer, ContentType: file.mimetype })

    res.json({ success: true, fileId: crypto.randomUUID() })
  }
)

// ── Server: parse with formidable (framework-agnostic) ────────────────
import formidable from 'formidable'
import type { IncomingMessage } from 'http'

async function parseMultipart(req: IncomingMessage) {
  const form = formidable({
    maxFileSize: 10 * 1024 * 1024,
    keepExtensions: true,
  })
  const [fields, files] = await form.parse(req)

  // fields.metadata[0] is a string (the JSON part)
  const metadata = JSON.parse(fields.metadata?.[0] ?? '{}')
  const file = files.file?.[0]  // formidable.File object

  return { metadata, file }
}

A common mistake is setting Content-Type: multipart/form-data manually in the fetch request headers — this omits the boundary parameter and breaks parsing. Always let the browser or FormData implementation set the Content-Type. When streaming large files to S3 instead of buffering them in memory, use multer.diskStorage() as an intermediate step or pipe the raw request stream directly using busboy for maximum memory efficiency.

Base64 File Encoding in JSON

Base64 encoding converts binary file data into a string of ASCII characters that can be safely embedded in a JSON value. The encoding uses 64 characters (A-Z, a-z, 0-9, +, /) and represents every 3 bytes of binary as exactly 4 ASCII characters — a fixed 33.33% size increase. A data URI wraps the base64 string with a MIME type prefix: data:image/jpeg;base64,/9j/4AAQ.... This format is understood natively by browsers for <img> src attributes and CSS url() values.

// ── Client: encode a File to base64 (browser) ────────────────────────

// Method 1: FileReader (callback-based, works everywhere)
function fileToBase64(file: File): Promise<string> {
  return new Promise((resolve, reject) => {
    const reader = new FileReader()
    reader.onload = () => {
      // result is a data URI: "data:image/jpeg;base64,/9j/4AAQ..."
      const dataUri = reader.result as string
      // Strip the prefix to get just the base64 string
      const base64 = dataUri.split(',')[1]
      resolve(base64)
    }
    reader.onerror = reject
    reader.readAsDataURL(file)
  })
}

// Method 2: arrayBuffer + btoa (modern browsers only)
async function fileToBase64Modern(file: File): Promise<string> {
  const buffer = await file.arrayBuffer()
  const bytes = new Uint8Array(buffer)
  // btoa requires a binary string (each char code maps to one byte)
  const binaryString = bytes.reduce(
    (str, byte) => str + String.fromCharCode(byte),
    ''
  )
  return btoa(binaryString)
}

// ── Client: send base64 JSON body ─────────────────────────────────────
async function uploadBase64(file: File, metadata: object) {
  // Guard: reject files over 1MB before encoding
  if (file.size > 1 * 1024 * 1024) {
    throw new Error('File too large for base64 upload. Use presigned URL.')
  }

  const base64 = await fileToBase64(file)

  const response = await fetch('/api/upload-base64', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      filename:  file.name,
      mimeType:  file.type,
      size:      file.size,           // original size in bytes
      data:      base64,              // base64-encoded content
      metadata,
    }),
  })
  return response.json()
}

// ── Server: decode base64 and process the file ────────────────────────
import express from 'express'

const app = express()
app.use(express.json({ limit: '2mb' }))  // base64 of 1MB = ~1.33MB

app.post('/api/upload-base64', (req, res) => {
  const { filename, mimeType, size, data, metadata } = req.body

  // Validate MIME type allowlist
  const allowed = ['image/jpeg', 'image/png', 'image/webp']
  if (!allowed.includes(mimeType)) {
    return res.status(400).json({ error: 'MIME type not allowed' })
  }

  // Validate claimed size vs actual decoded size
  const buffer = Buffer.from(data, 'base64')
  if (buffer.byteLength !== size) {
    return res.status(400).json({ error: 'Size mismatch' })
  }

  // buffer is a standard Node.js Buffer — use it like any file bytes
  console.log('Decoded size:', buffer.byteLength, 'bytes')
  console.log('Metadata:', metadata)

  // Upload to S3:
  // await s3.putObject({ Bucket, Key: filename, Body: buffer, ContentType: mimeType })

  res.json({ success: true })
})

// ── Browser: display base64 data URI as image ─────────────────────────
function previewBase64(base64: string, mimeType: string): string {
  return `data:${mimeType};base64,${base64}`
}

// <img src={previewBase64(base64, 'image/jpeg')} alt="preview" />

// ── Size overhead reference ────────────────────────────────────────────
// original_bytes × 4/3 = base64_bytes (always exactly +33.33%)
// 100 KB → 133 KB
// 500 KB → 667 KB
// 1 MB   → 1.33 MB   (body parser limit must be ≥ 1.33 MB)
// 5 MB   → 6.67 MB   ← do NOT use base64 at this size

Always validate the decoded buffer size against the client-claimed size field server-side — a malicious client could send a tiny size claim with a giant base64 payload to bypass client-side guards. Never validate MIME type from the client-provided mimeType field alone; use the file-type npm package to detect the actual format from the decoded buffer's magic bytes. Base64 strings are also significantly harder to stream — the entire string must be in memory before decoding begins, making multipart or presigned URLs preferable for any file that approaches 1MB.

Presigned URL Upload Pattern

Presigned URLs are temporary, pre-authenticated URLs that grant a specific HTTP operation (typically PUT) on a specific object storage path without the client needing cloud credentials. The client uploads binary data directly to S3, GCS, or R2 — your API server never handles the file bytes. This reduces server memory, eliminates bandwidth cost on your API tier, and enables upload progress tracking at full resolution.

// ── Step 1: Client requests a presigned upload URL ───────────────────
async function getPresignedUrl(file: File) {
  const response = await fetch('/api/upload/init', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      filename: file.name,
      mimeType: file.type,
      size:     file.size,
    }),
  })
  // Returns: { uploadUrl: string, fileId: string, storageKey: string }
  return response.json()
}

// ── Step 2: Server generates the presigned URL ────────────────────────
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'
import { getSignedUrl } from '@aws-sdk/s3-request-presigner'

const s3 = new S3Client({ region: 'us-east-1' })

// POST /api/upload/init
async function initUpload(req: Request) {
  const { filename, mimeType, size } = await req.json()

  // Validate inputs server-side (never trust client)
  const maxSize = 100 * 1024 * 1024  // 100MB
  if (size > maxSize) {
    return Response.json({ error: 'File too large' }, { status: 413 })
  }
  const allowed = ['image/jpeg', 'image/png', 'video/mp4', 'application/pdf']
  if (!allowed.includes(mimeType)) {
    return Response.json({ error: 'MIME type not allowed' }, { status: 400 })
  }

  // Generate a unique storage key (never use client filename directly)
  const fileId = crypto.randomUUID()
  const ext = filename.split('.').pop()?.toLowerCase() ?? 'bin'
  const storageKey = `uploads/staging/${fileId}.${ext}`

  // Generate presigned PUT URL
  const command = new PutObjectCommand({
    Bucket: process.env.S3_BUCKET!,
    Key: storageKey,
    ContentType: mimeType,
    ContentLength: size,  // S3 enforces this — client cannot upload more
    // Optional: add Content-Length-Range condition
    // Conditions: [['content-length-range', 0, maxSize]]
  })

  const uploadUrl = await getSignedUrl(s3, command, {
    expiresIn: 900,  // 15 minutes (S3 default)
  })

  // Save pending upload record to database
  // await db.insert({ fileId, storageKey, status: 'pending', mimeType, size })

  return Response.json({ uploadUrl, fileId, storageKey })
}

// ── Step 3: Client uploads directly to S3 ────────────────────────────
async function uploadToPresignedUrl(
  presignedUrl: string,
  file: File,
  onProgress?: (percent: number) => void
) {
  return new Promise<void>((resolve, reject) => {
    const xhr = new XMLHttpRequest()

    // Progress tracking — fetch() does not support upload progress
    xhr.upload.onprogress = (event) => {
      if (event.lengthComputable) {
        const percent = Math.round((event.loaded / event.total) * 100)
        onProgress?.(percent)
      }
    }

    xhr.onload = () => {
      if (xhr.status === 200) resolve()
      else reject(new Error(`Upload failed: ${xhr.status}`))
    }
    xhr.onerror = () => reject(new Error('Network error'))

    // PUT raw binary to presigned URL
    xhr.open('PUT', presignedUrl)
    xhr.setRequestHeader('Content-Type', file.type)
    xhr.send(file)  // send File directly — no encoding
  })
}

// ── Step 4: Client confirms upload and sends metadata ─────────────────
async function confirmUpload(fileId: string, metadata: object) {
  const response = await fetch('/api/upload/confirm', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ fileId, metadata }),
  })
  return response.json()
}

// ── Step 4: Server confirms — moves staging key, saves metadata ───────
async function confirmUploadHandler(req: Request) {
  const { fileId, metadata } = await req.json()

  // Verify object exists in S3 staging
  // const head = await s3.send(new HeadObjectCommand({ Bucket, Key: stagingKey }))

  // Move from staging/uuid.jpg to permanent/uuid.jpg
  // await s3.send(new CopyObjectCommand({ ... }))
  // await s3.send(new DeleteObjectCommand({ Bucket, Key: stagingKey }))

  // Update database record: status 'pending' -> 'uploaded'
  // await db.update({ fileId }, { status: 'uploaded', confirmedAt: new Date() })

  return Response.json({ success: true, fileId })
}

// ── Full client-side flow ─────────────────────────────────────────────
async function uploadFile(file: File, metadata: object, onProgress: (n: number) => void) {
  const { uploadUrl, fileId } = await getPresignedUrl(file)  // Step 1+2
  await uploadToPresignedUrl(uploadUrl, file, onProgress)    // Step 3
  return confirmUpload(fileId, metadata)                     // Step 4
}

Always generate storage keys server-side using a UUID — never use the client-provided filename as the S3 key. Client filenames can contain path traversal sequences (../../etc/passwd) or overwrite existing objects. The ContentLength parameter on the PutObjectCommand enforces the exact file size at the S3 level — S3 will reject uploads that are larger or smaller than declared, preventing padding attacks. The 15-minute presigned URL expiry is the S3 default; increase it for very slow connections or large files with expiresIn: 3600.

File Upload in Next.js App Router

Next.js App Router has specific constraints for file uploads: route handlers default to a 4MB body limit, Server Actions use a different transport with higher limits, and Vercel Blob provides a first-class upload integration. Understanding which mechanism to use for which file size prevents surprising 413 errors in production.

// ── Route Handler: increase body size limit ──────────────────────────
// app/api/upload/route.ts
import { NextRequest, NextResponse } from 'next/server'

// Increase body parser limit for this specific route
export const config = {
  api: {
    bodyParser: {
      sizeLimit: '50mb',  // increase from default 4MB
    },
  },
}

export async function POST(req: NextRequest) {
  const formData = await req.formData()
  const file = formData.get('file') as File
  const metadata = JSON.parse(formData.get('metadata') as string)

  // Process the file
  const bytes = await file.arrayBuffer()
  const buffer = Buffer.from(bytes)

  return NextResponse.json({ success: true, size: buffer.byteLength })
}

// ── Server Action: upload with FormData (higher limit than route handler)
// app/actions/upload.ts
'use server'

import { put } from '@vercel/blob'

export async function uploadFileAction(formData: FormData) {
  const file = formData.get('file') as File
  const metadataStr = formData.get('metadata') as string
  const metadata = JSON.parse(metadataStr)

  if (!file || file.size === 0) {
    throw new Error('No file provided')
  }

  // Vercel Blob: upload directly from Server Action
  const blob = await put(file.name, file, {
    access: 'public',
    addRandomSuffix: true,  // prevents filename collisions
  })

  // blob.url is the public CDN URL
  return { url: blob.url, metadata }
}

// ── Client Component: call Server Action with FormData ────────────────
// app/components/upload-form.tsx
'use client'

import { uploadFileAction } from '@/app/actions/upload'

export function UploadForm() {
  async function handleSubmit(event: React.FormEvent<HTMLFormElement>) {
    event.preventDefault()
    const form = event.currentTarget
    const formData = new FormData(form)

    // Add JSON metadata alongside the file field
    const metadata = { albumId: 42, caption: 'Hero image' }
    formData.append('metadata', JSON.stringify(metadata))

    const result = await uploadFileAction(formData)
    console.log('Uploaded to:', result.url)
  }

  return (
    <form onSubmit={handleSubmit}>
      <input type="file" name="file" accept="image/*" />
      <button type="submit">Upload</button>
    </form>
  )
}

// ── Vercel Blob: client-side upload (presigned URL pattern built-in) ──
// app/components/client-upload.tsx
'use client'

import { upload } from '@vercel/blob/client'

export function ClientUpload() {
  async function handleFile(file: File) {
    // upload() handles the presigned URL generation server-side
    // and uploads directly from the browser — no size limit from your server
    const blob = await upload(file.name, file, {
      access: 'public',
      handleUploadUrl: '/api/upload-token',  // your route handler generates the token
    })
    console.log('URL:', blob.url)
  }
  // ...
}

// ── Route handler for Vercel Blob client upload token ─────────────────
// app/api/upload-token/route.ts
import { handleUpload, type HandleUploadBody } from '@vercel/blob/next'

export async function POST(req: Request): Promise<Response> {
  const body = (await req.json()) as HandleUploadBody

  const jsonResponse = await handleUpload({
    body,
    request: req,
    onBeforeGenerateToken: async (pathname) => {
      // Add auth check here — return token config
      return {
        allowedContentTypes: ['image/jpeg', 'image/png', 'image/webp'],
        maximumSizeInBytes: 50 * 1024 * 1024,  // 50MB
      }
    },
    onUploadCompleted: async ({ blob, tokenPayload }) => {
      // Fires after successful upload — save metadata to DB
      console.log('Upload complete:', blob.url)
    },
  })
  return Response.json(jsonResponse)
}

Server Actions are the recommended path for most Next.js App Router file uploads under 50MB — they avoid the 4MB route handler limit and integrate cleanly with Vercel Blob. For larger files or when you need fine-grained progress control, use Vercel Blob's client-side upload function which implements the presigned URL pattern internally. Never process large file bytes in a Vercel serverless function — the 10-second execution timeout and memory limits will cause failures for anything over a few megabytes.

Upload Progress and Resumable Uploads

Upload progress tracking requires XMLHttpRequest — the fetch API does not expose upload progress events in most browsers. For resumable uploads where network interruptions should not restart from zero, the tus protocol provides a standardized JSON-compatible control message system built on plain HTTP headers.

// ── XHR upload progress (multipart or presigned URL PUT) ─────────────
import { useState, useRef } from 'react'

function useFileUpload() {
  const [progress, setProgress] = useState(0)
  const [status, setStatus] = useState<'idle' | 'uploading' | 'done' | 'error'>('idle')
  const xhrRef = useRef<XMLHttpRequest | null>(null)

  const upload = (url: string, file: File, method: 'POST' | 'PUT' = 'PUT') => {
    return new Promise<void>((resolve, reject) => {
      const xhr = new XMLHttpRequest()
      xhrRef.current = xhr

      // Upload progress: fires repeatedly as bytes are sent
      xhr.upload.onprogress = (event: ProgressEvent) => {
        if (event.lengthComputable) {
          const pct = Math.round((event.loaded / event.total) * 100)
          setProgress(pct)
        }
      }

      xhr.upload.onloadstart = () => setStatus('uploading')

      xhr.onload = () => {
        if (xhr.status >= 200 && xhr.status < 300) {
          setStatus('done')
          resolve()
        } else {
          setStatus('error')
          reject(new Error(`HTTP ${xhr.status}: ${xhr.responseText}`))
        }
      }

      xhr.onerror = () => { setStatus('error'); reject(new Error('Network error')) }
      xhr.ontimeout = () => { setStatus('error'); reject(new Error('Timeout')) }

      xhr.open(method, url)

      if (method === 'PUT') {
        // Presigned URL PUT: set Content-Type to match the presigned request
        xhr.setRequestHeader('Content-Type', file.type)
        xhr.send(file)  // raw binary
      } else {
        // Multipart POST: let XHR set Content-Type with boundary
        const formData = new FormData()
        formData.append('file', file)
        xhr.send(formData)
      }
    })
  }

  const cancel = () => xhrRef.current?.abort()

  return { progress, status, upload, cancel }
}

// ── Chunk size calculation ────────────────────────────────────────────
function calculateChunkSize(connectionType: string): number {
  // navigator.connection.effectiveType values: 'slow-2g' | '2g' | '3g' | '4g'
  const chunkSizes: Record<string, number> = {
    'slow-2g': 256 * 1024,    //  256 KB — very slow connections
    '2g':       512 * 1024,   //  512 KB
    '3g':      2 * 1024 * 1024,  // 2 MB
    '4g':      8 * 1024 * 1024,  // 8 MB — default for fast connections
  }
  return chunkSizes[connectionType] ?? 5 * 1024 * 1024  // 5MB default
}

// ── tus resumable upload protocol ────────────────────────────────────
// npm install tus-js-client

import * as tus from 'tus-js-client'

async function tusUpload(file: File, metadata: object, onProgress: (pct: number) => void) {
  return new Promise<string>((resolve, reject) => {
    const upload = new tus.Upload(file, {
      endpoint: 'https://api.example.com/uploads/',
      retryDelays: [0, 1000, 3000, 5000],  // retry on failure

      // Chunk size — tus sends one PATCH per chunk
      chunkSize: 5 * 1024 * 1024,  // 5MB chunks

      // JSON-compatible metadata sent as base64 key-value pairs in headers:
      // Upload-Metadata: filename dGVzdC5qcGc=,mimeType aW1hZ2UvanBlZw==
      metadata: {
        filename: file.name,
        filetype: file.type,
        albumId:  String((metadata as any).albumId),
      },

      onProgress: (bytesUploaded, bytesTotal) => {
        const pct = Math.round((bytesUploaded / bytesTotal) * 100)
        onProgress(pct)
      },

      onSuccess: () => {
        // upload.url is the resource URL for the completed upload
        resolve(upload.url!)
      },

      onError: reject,
    })

    // Check for a previous incomplete upload and resume from offset
    upload.findPreviousUploads().then((previousUploads) => {
      if (previousUploads.length) {
        upload.resumeFromPreviousUpload(previousUploads[0])
      }
      upload.start()
    })
  })
}

// ── tus control headers (JSON-compatible) ────────────────────────────
// tus protocol uses plain HTTP headers — no binary encoding of control info:
//
// POST /uploads              (create upload resource)
//   Tus-Resumable: 1.0.0
//   Upload-Length: 10485760  (file size in bytes)
//   Upload-Metadata: filename dGVzdC5qcGc=,filetype aW1hZ2UvanBlZw==
// → 201 Created, Location: /uploads/abc123
//
// PATCH /uploads/abc123      (upload chunk)
//   Tus-Resumable: 1.0.0
//   Upload-Offset: 0         (byte position of this chunk)
//   Content-Type: application/offset+octet-stream
//   Content-Length: 5242880  (chunk size)
// → 204 No Content, Upload-Offset: 5242880
//
// HEAD /uploads/abc123       (check resume point after network failure)
//   Tus-Resumable: 1.0.0
// → 200 OK, Upload-Offset: 5242880  (resume from here)

The tus protocol's key advantage over custom chunked upload implementations is standardization — the findPreviousUploads() method uses localStorage (browser) or a fingerprint file (Node.js) to resume incomplete uploads across page reloads or app restarts without any backend session management. The control headers (Upload-Offset, Upload-Length, Tus-Resumable) are plain ASCII and trivially loggable, debuggable, and compatible with any HTTP proxy — no special binary protocol handling needed.

File Metadata JSON Schema

A well-designed file metadata JSON schema captures the information needed for storage, access control, integrity verification, and content moderation — without over-collecting fields that create privacy risk. The schema should be set server-side where possible; client-provided metadata should be treated as untrusted input and validated against an allowlist.

// ── File metadata TypeScript type ────────────────────────────────────
interface FileMetadata {
  // Core identity
  fileId:      string    // UUID v4, generated server-side
  filename:    string    // sanitized original filename (max 255 chars)
  size:        number    // file size in bytes (set from buffer.byteLength server-side)
  mimeType:    string    // validated MIME type (from magic bytes, not client claim)

  // Storage
  storageKey:  string    // object storage path: 'uploads/2026/02/uuid.jpg'
  storageUrl?: string    // public CDN URL (only for public files)

  // Integrity
  checksum:    string    // SHA-256 hex digest of file content (server-computed)

  // Timestamps (all UTC ISO 8601, server-set)
  uploadedAt:  string    // '2026-02-04T10:30:00.000Z'
  confirmedAt?: string   // set after presigned URL upload is confirmed

  // Ownership
  uploadedBy:  string    // user ID reference

  // Content moderation pipeline
  status: 'pending' | 'scanning' | 'approved' | 'rejected'
  scanResult?: {
    scannedAt: string
    clean:     boolean
    threats?:  string[]  // ['Trojan.GenericKD.48234'] if threats found
  }

  // Type-specific metadata (set server-side)
  dimensions?: { width: number; height: number }  // images
  duration?:   number                              // video/audio, seconds

  // Application metadata (client-provided, validated)
  tags?:    string[]   // max 20 items, each max 50 chars
  caption?: string     // max 500 chars
}

// ── JSON Schema for validation (Ajv / pg_jsonschema compatible) ───────
const fileMetadataSchema = {
  $schema: 'http://json-schema.org/draft-07/schema#',
  type: 'object',
  required: ['fileId', 'filename', 'size', 'mimeType', 'storageKey', 'checksum', 'uploadedAt', 'uploadedBy', 'status'],
  properties: {
    fileId:      { type: 'string', format: 'uuid' },
    filename:    { type: 'string', minLength: 1, maxLength: 255,
                   pattern: '^[^/\\<>:"|?*]+$' },     // no path traversal chars
    size:        { type: 'integer', minimum: 1, maximum: 100 * 1024 * 1024 },
    mimeType:    { type: 'string', enum: ['image/jpeg','image/png','image/webp','application/pdf','video/mp4'] },
    storageKey:  { type: 'string', pattern: '^uploads/' },
    checksum:    { type: 'string', pattern: '^[a-f0-9]{64}$' },  // SHA-256 hex
    uploadedAt:  { type: 'string', format: 'date-time' },
    uploadedBy:  { type: 'string' },
    status:      { type: 'string', enum: ['pending','scanning','approved','rejected'] },
    tags:        { type: 'array', items: { type: 'string', maxLength: 50 }, maxItems: 20 },
    caption:     { type: 'string', maxLength: 500 },
  },
  additionalProperties: false,
}

// ── Server: compute checksum after receiving file bytes ───────────────
import crypto from 'crypto'

function computeChecksum(buffer: Buffer): string {
  return crypto.createHash('sha256').update(buffer).digest('hex')
}

// ── Server: build metadata object after upload ────────────────────────
async function buildFileMetadata(
  file: Express.Multer.File,
  userId: string,
  clientMetadata: { tags?: string[]; caption?: string }
): Promise<FileMetadata> {
  const fileId = crypto.randomUUID()
  const ext = file.originalname.split('.').pop()?.toLowerCase() ?? 'bin'
  const now = new Date().toISOString()

  return {
    fileId,
    filename:   file.originalname.replace(/[/\<>:"|?*]/g, '_').slice(0, 255),
    size:       file.buffer.byteLength,   // always from buffer, never client claim
    mimeType:   file.mimetype,            // from multer's magic-byte detection
    storageKey: `uploads/${now.slice(0,7)}/${fileId}.${ext}`,
    checksum:   computeChecksum(file.buffer),
    uploadedAt: now,
    uploadedBy: userId,
    status:     'pending',               // virus scan not yet run
    tags:       (clientMetadata.tags ?? []).slice(0, 20).map(t => t.slice(0, 50)),
    caption:    clientMetadata.caption?.slice(0, 500),
  }
}

// ── Virus scanning hook ───────────────────────────────────────────────
// After upload: enqueue a scan job (do NOT serve the file until approved)
async function enqueueVirusScan(fileId: string, storageKey: string) {
  // Option 1: ClamAV via REST wrapper (clam-av-client npm)
  // Option 2: VirusTotal API (POST /files with the file bytes)
  // Option 3: AWS GuardDuty Malware Protection (S3 event trigger)

  // Example: update status to 'scanning' and add to queue
  await db.update('files', { status: 'scanning' }, { fileId })
  await queue.add('virus-scan', { fileId, storageKey })
}

// Never return a file URL to clients while status is 'pending' or 'scanning'
async function getFileUrl(fileId: string, userId: string): Promise<string> {
  const file = await db.findOne('files', { fileId, uploadedBy: userId })
  if (file.status !== 'approved') {
    throw new Error('File not yet approved for serving')
  }
  // Generate presigned GET URL valid for 1 hour
  return getSignedUrl(s3, new GetObjectCommand({ Bucket, Key: file.storageKey }), { expiresIn: 3600 })
}

The most critical fields to compute server-side are checksum (enables deduplication and integrity verification), size (always from the decoded buffer, never from client input), and mimeType (use magic-byte detection, not the client's Content-Type claim). Storing the storageKey separately from a public URL allows you to change CDN providers or access control policies without updating database records. The status field drives the virus scanning pipeline — gate all file serving behind an approved status check to prevent serving malicious content before scanning completes.

Key Terms

Presigned URL
A time-limited, cryptographically signed URL that grants temporary permission to perform a specific HTTP operation (typically PUT for upload or GET for download) on a specific object in cloud storage (S3, GCS, R2, Azure Blob), without the client needing cloud credentials. The URL includes the bucket, object key, expiry time, and an HMAC signature computed from the target IAM credentials. When the client sends a PUT to the presigned URL with the correct Content-Type and file size, the cloud provider verifies the signature and stores the object directly — your API server never handles the file bytes. Presigned PUT URLs are valid for a configurable duration (AWS S3 default: 15 minutes, maximum: 7 days). Security constraints can be embedded in the signature using policy conditions: Content-Length-Range enforces file size limits, Content-Type enforces MIME type, and key prefixes restrict which paths can be written.
Multipart Form Data
An HTTP encoding format (Content-Type: multipart/form-data) that sends multiple named parts — each with its own headers and body — separated by a randomly generated boundary string. Each part can have a different Content-Type, enabling a single HTTP request to carry both a JSON metadata part (Content-Type: application/json) and a binary file part (Content-Type: image/jpeg). The boundary string is generated by the HTTP client (browser FormData API, axios, curl) and included in the outer Content-Type header. Server parsers (multer, busboy, formidable, Next.js formData()) extract each part by scanning for the boundary. Boundary parsing overhead is ~2-5ms regardless of file size because parsers stream the body and split on boundary markers without buffering the entire request. Multipart is the standard format for HTML form file inputs and is supported natively by all browsers and server frameworks.
Base64 Encoding
A binary-to-text encoding scheme that represents arbitrary binary data as a string of 64 ASCII characters (A-Z, a-z, 0-9, +, /), with = as padding. It works by taking every 3 bytes of binary input and converting them to 4 ASCII characters, resulting in a fixed 33.33% size increase. In the browser, btoa() encodes a binary string to base64 and atob() decodes it. In Node.js, Buffer.from(data, 'base64') decodes and buffer.toString('base64') encodes. Base64 is used to embed binary file data in JSON fields (which can only contain Unicode text), in data URIs for inline images, and in HTTP Basic Auth headers. The 33% overhead makes it unsuitable for files larger than 1MB in most API contexts — a 5MB file becomes 6.67MB of JSON, straining body parser limits and increasing bandwidth cost for both client and server.
Data URI
A URI scheme (RFC 2397) that embeds file content directly in a URL string rather than referencing an external resource. The format is data:[mediatype][;base64],data — for example, data:image/png;base64,iVBORw0KGgo.... Data URIs are recognized by browsers as valid src attributes for <img> elements, as CSS url() values for backgrounds, and as href attributes for download links. They are generated from file bytes using FileReader.readAsDataURL() in the browser. When sending a file via a JSON API, a data URI can be used as the value of a JSON field, but the base64 portion must be stripped if the server expects raw base64 rather than the full data URI prefix. Data URIs are most appropriate for small inline assets (icons, thumbnails) where an additional HTTP request would add more overhead than the 33% encoding cost.
Resumable Upload (tus)
A standardized protocol (https://tus.io) for resumable HTTP file uploads that can survive network interruptions without restarting from the beginning. tus uses a three-phase flow: (1) POST to create an upload resource and receive a Location URL; (2) PATCH to send file chunks with the Upload-Offset header indicating the byte position of each chunk; (3) HEAD to query the current Upload-Offset after a network failure and resume from that position. All control information is exchanged as plain ASCII HTTP headers (Tus-Resumable, Upload-Offset, Upload-Length, Upload-Metadata) — no binary control protocol, fully JSON-compatible logging. The tus-js-client npm package handles client-side fingerprinting, resume detection via localStorage, retry logic, and chunk size negotiation automatically. Server implementations include tus-node-server (Node.js) with S3, GCS, and local disk backends.
Content-Length-Range
An AWS S3 presigned URL policy condition that enforces minimum and maximum file size constraints on an upload. Specified as a policy condition during presigned URL generation: ['content-length-range', minBytes, maxBytes]. When the client sends a PUT to the presigned URL, S3 verifies that the Content-Length header falls within the declared range before accepting the upload — if the file is smaller or larger than allowed, S3 returns 403 Forbidden and no data is stored. This prevents a malicious client from substituting a file of a different size than originally declared when requesting the presigned URL. Combined with a Content-Type condition, it provides server-enforced file constraints without requiring your API server to be in the upload data path.
MIME Type
A two-part identifier (type/subtype) standardized by IANA that describes the format of a file or HTTP body. The type is a broad category (image, video, audio, application, text) and the subtype is the specific format (jpeg, png, pdf, json, mp4). In HTTP, the Content-Type header carries the MIME type of the request or response body. For file uploads, the MIME type declared by the client in Content-Type or a form field should never be trusted alone — attackers can send a PHP file with Content-Type: image/jpeg. Server-side validation should use magic-byte detection: the file-type npm package reads the first few bytes of the decoded buffer and identifies the actual format regardless of the declared type. Only MIME types on an explicit server-side allowlist should be accepted.
Checksum
A fixed-length hash derived from file content using a cryptographic hash function (typically SHA-256 for file integrity), stored as a hex string. The checksum is computed server-side from the raw file bytes after upload — never accepted from the client. It serves multiple purposes: integrity verification (re-hashing the file later and comparing checksums confirms the file has not been modified or corrupted in storage), deduplication (two files with the same SHA-256 are almost certainly identical — skip storing a duplicate), and audit trails (the checksum links a stored object to its original upload record). S3 provides an ETag header after PUT operations which is the MD5 of the uploaded content for single-part uploads — for multi-part uploads the ETag format is different and not a direct MD5. Computing a SHA-256 in Node.js: crypto.createHash('sha256').update(buffer).digest('hex').

FAQ

Should I use multipart/form-data or base64 JSON encoding for file uploads in my API?

The decision depends primarily on file size and API design consistency. Use multipart/form-data when files exceed 1MB, when you need streaming upload support, or when interoperability with standard HTTP clients matters most. The boundary parsing overhead is small (~2-5ms per request regardless of file size) and the browser and server ecosystem handles multipart natively. Use base64 JSON encoding when files are reliably under 1MB, when your API is already fully JSON and adding multipart support would require significant client SDK changes, or when you need to embed file data inside a larger atomic JSON transaction. Base64 adds exactly 33% size overhead — a 900KB file becomes ~1.2MB encoded. For files larger than 5MB, use presigned URLs: the client uploads binary data directly to S3 and your API server never handles file bytes at all. The most practical rule: base64 for under 1MB, multipart for 1-10MB, presigned URLs for anything larger or when server bandwidth cost matters.

How do I upload a file with JSON metadata in a single API request?

There are two main approaches. With multipart/form-data, construct a FormData object, append a Blob with type: 'application/json' as the "metadata" field, then append the file as the "file" field. On the server, use multer().fields([{name:"metadata"}, {name:"file"}]) and recover the metadata with JSON.parse(req.files.metadata[0].buffer.toString()). With base64 JSON, use FileReader.readAsDataURL() in the browser to convert the file to a base64 string, then POST a pure JSON body: {filename, mimeType, size, data: base64, metadata: {...}} . The server decodes with Buffer.from(data, 'base64'). Limit base64 to files under 1MB to avoid memory pressure. For very large files, use the presigned URL pattern: POST /upload/init returns a presigned URL, the client PUTs the raw file directly to S3, then POST /upload/confirm sends the metadata as pure JSON — three separate requests but the server never handles binary data.

What is a presigned URL and how does it improve file upload performance?

A presigned URL is a time-limited, cryptographically signed URL that grants temporary permission to PUT a specific object to S3, GCS, or R2 without the client needing cloud credentials. The upload flow has four steps: (1) Client sends POST /upload/init with filename, mimeType, and size as JSON. (2) Your server generates a presigned URL via the AWS SDK and returns {uploadUrl, fileId}. (3) Client sends PUT directly to the presigned URL with the raw binary file body — no encoding. (4) Client sends POST /upload/confirm with {fileId, metadata} and your server creates the database record. Performance improvements are significant: your server is never a data pipe, the client uploads at the network speed to S3 (often faster than to your server), S3 handles large file chunking natively, and serverless functions are not billed for transfer time. Security is maintained because the presigned URL is valid only for the specific object key, the declared Content-Type, and within the expiry window (default 15 minutes). Add a Content-Length-Range condition to enforce file size limits at the S3 level.

How do I handle large file uploads in Next.js App Router without hitting size limits?

Next.js App Router route handlers default to a 4MB body size limit. You have three strategies. First, increase the limit for a specific route by exporting export const config = {api: {bodyParser: {sizeLimit: '50mb'}}} — but this still buffers the entire body in memory, which is problematic for large files. Second, use Server Actions with FormData: Server Actions use a different internal transport that avoids the 4MB route handler limit. Call the action with new FormData() and use formData.get('file') server-side. Vercel Blob's put() integrates directly: const blob = await put(file.name, file, {access: 'public'}). Third (recommended for large files), implement the presigned URL pattern: the route handler only processes a small JSON body to generate the presigned URL; the actual file bytes never touch Next.js. For Vercel Blob, use import {upload} from '@vercel/blob/client' which implements client-side upload with a server-side token handler automatically.

How do I implement upload progress tracking for a JSON API file upload?

For multipart or presigned URL PUT uploads, use XMLHttpRequest — the fetch API does not expose upload progress events in most browsers. Attach a listener to xhr.upload.onprogress: the event provides event.loaded (bytes sent) and event.total (total bytes). Divide these to get a percentage: Math.round((event.loaded / event.total) * 100). For React, store progress in useState and update it inside the callback. For presigned URL uploads, set xhr.setRequestHeader('Content-Type', file.type) and xhr.send(file) — sending the raw File object. For base64 JSON uploads via fetch, true byte-level progress is unavailable; switch to XHR for consistency. For resumable uploads using the tus protocol, the tus-js-client library provides an onProgress(bytesUploaded, bytesTotal) callback that fires per chunk. Chunk size affects progress granularity: 5MB chunks mean progress updates every 5MB; use 256KB chunks on slow connections for smoother reporting.

What should I include in the JSON metadata schema for uploaded files?

A production file upload metadata schema should include seven core fields: filename (sanitized original name, no path traversal characters, max 255 chars), size (byte count from the decoded buffer server-side, never from client input), mimeType (validated against an allowlist using magic-byte detection, not the client-declared Content-Type), checksum (SHA-256 hex digest computed server-side — enables integrity verification and deduplication), uploadedAt (ISO 8601 UTC timestamp set by the server), storageKey (object storage path like uploads/2026/02/uuid.jpg — keep internal, generate from UUID not filename), and status ('pending' | 'scanning' | 'approved' | 'rejected' — drives the virus scanning pipeline). Optional but valuable: dimensions for images (extract server-side with sharp or jimp), duration for video/audio, and a scanResult object updated after running ClamAV or VirusTotal. Never serve files to users until status is 'approved'.

How do I implement resumable file uploads with a JSON API?

The tus protocol (tus.io) is the standard. Install tus-js-client on the client and tus-node-server on the server. The client creates a tus.Upload with the file, endpoint URL, chunk size (5MB recommended), and metadata object. Call upload.findPreviousUploads() to check localStorage for an incomplete previous upload — if found, call upload.resumeFromPreviousUpload(previous[0]) before upload.start(). The protocol sends a POST to create the upload resource, then PATCHes with each chunk and the current Upload-Offset header. On network failure, a HEAD request retrieves the current offset and the next PATCH resumes from that byte position. The onProgress(bytesUploaded, bytesTotal) callback fires per chunk for UI updates. Chunk size calculation: target 5MB for 4G connections, 512KB for 3G, 256KB for 2G. Control overhead is negligible — all tus headers are plain ASCII text, fully compatible with HTTP proxies, logging systems, and JSON-based monitoring.

Further reading and primary sources