← Back to Backend Fundamental Components

CDN

Contents

A reference on CDN (Content Delivery Network) / edge caching as a technology class. Covers the design space, internal mechanics, capacity envelope, hard problems, and failure modes — applicable whether the workload is static images for a SaaS, a global API like Stripe's, a Linux package mirror, 4K video like Netflix, or ML model artifacts. Same primitive, very different consumption patterns; this doc unpacks the primitive.


§1. What CDNs ARE — and what they are NOT

A CDN is a distributed edge cache plus a global routing layer that brings content geographically (and topologically) close to the user. Two ideas fused into one product:

  1. A caching tier — many small, fast, read-optimized object stores running at the network edge, each holding a working set of recently/frequently requested objects.
  2. A request-steering layer — anycast IPs advertised via BGP (Border Gateway Protocol), GeoDNS (Geographic DNS), and RUM (Real User Monitoring) feedback that route a user's TCP/QUIC connection to the topologically nearest cache.

Either alone is something else. Caching without steering is a reverse proxy cache (Varnish or nginx in one datacenter — useful but limited by the speed of light from one region to the rest of the planet). Steering without caching is a global L4/L7 load balancer (AWS Global Accelerator before you put anything behind it). The CDN is the fusion, deployed at every major exchange point on the internet.

Where it sits. From the user's perspective, the CDN is the origin — TLS terminates at the CDN, HTTP requests are served by the CDN, only on miss does anything reach your servers. From the origin's perspective, the CDN is a single (large, distributed) client doing a lot of conditional GETs.

What CDNs are NOT good for:

  • Source of truth. Cache entries are derived state. Lose a PoP (Point of Presence), lose the bytes there — they refetch from origin. If origin lost the bytes, the CDN cannot resurrect them. Object storage (S3, GCS) is the durability layer; the CDN is the latency layer.
  • Strong consistency. Cache freshness is eventual. A purge propagates over hundreds of milliseconds to seconds. Reading your own writes through a CDN requires bypass on writes or URL versioning on reads.
  • General-purpose database. Even CDN-hosted KV products (Cloudflare KV, Fastly KV Store) are eventually consistent caches with TTLs, not OLTP (Online Transaction Processing) systems. Excellent for feature flags and edge config; not for billing ledgers.
  • Universal accelerator. Some CDNs limit what they cache. Video CDNs only cache segment-shaped HTTP. Dynamic personalized content (your logged-in homepage with your name) is uncacheable without edge compute or per-bucket cache-key strategy.

The mental model: CDN = derived edge state for latency + offload + DDoS dilution, sitting in front of an immutable-or-versioned source of truth.


§2. Inherent guarantees vs what must be layered on

A CDN provides by construction:

  • Hit-rate-driven origin offload. Push hit rate from 90% to 99% and origin load drops 10x. The unit of guarantee: "an object cached at edge serves up to 1/TTL requests per object per PoP before refetch."
  • Topological proximity via anycast. A user's SYN lands on the nearest PoP within ~1 RTT, typically 5-30 ms anywhere on the populated planet. Mechanism is BGP — "topologically nearest by AS-path." Usually correlates with physically nearest; not always.
  • DDoS dilution by capacity arbitrage. A 5 Tbps L3 flood from a worldwide botnet, being worldwide, distributes across the CDN's PoPs. A CDN with 280 Tbps of egress (Cloudflare's number) doesn't have to filter most attacks — it just has more wire than the attacker.
  • TLS offload close to user. TLS terminates at the PoP. Sydney users hit a Sydney PoP at ~30 ms instead of ~300 ms to us-east-1.

What a CDN does NOT give you — must be layered on:

  • Read-your-writes / strong consistency. Bounded staleness is bounded but not zero. Bypass cache on writes, version URLs on reads, or invalidate via edge compute.
  • Durability. Cache bytes are evictable at any time per the engine's policy. Origin keeps the canonical copy.
  • Schema/protocol awareness. Most CDNs see HTTP requests as opaque GETs. They don't know /api/users/123/posts and /api/posts?user=123 are the same entity. Cache-key normalization and surrogate-key tagging are the designer's job.
  • Per-user privacy isolation. A misconfigured cache key can serve user A's authenticated content to user B. The CDN doesn't enforce that contract — Cache-Control: private, cookie-based bucketing, and custom VCL (Varnish Configuration Language) / Worker code do.
  • Compute. A vanilla CDN serves bytes; edge compute (Cloudflare Workers, Fastly Compute@Edge, Lambda@Edge) is an additional product on top.

Stated plainly: "I will reduce your origin load by 10-100x and place your bytes 5-30 ms from any user, as long as you tell me what is cacheable, give me well-formed cache keys, and accept eventual freshness."


§3. The design space

The CDN class splits along third-party / commodity vs purpose-built / private, with a third category for specialized variants.

Third-party / shared CDN

SaaS (Software as a Service) offerings where many tenants share PoPs, separated by hostname / customer ID / configuration. Economics work because diverse tenant traffic amortizes fixed PoP cost across hundreds of thousands of customers.

  • Cloudflare — ~300 PoPs, anycast-first, integrated DDoS + WAF (Web Application Firewall), edge compute via V8 isolates (Workers). Best DDoS profile in the industry.
  • Akamai — ~4,200 PoPs (often single-rack installs inside ISPs). Oldest CDN; deepest enterprise feature set; streaming and gaming industries lean heavily on it.
  • Fastly — ~70 high-capacity "superPOPs." VCL programmability is the differentiator — request-handling logic written in a Varnish-derived DSL, pushed globally in seconds.
  • CloudFront — ~450 edge locations + ~13 regional edge caches. Tightly integrated with AWS origins (S3, ALB).
  • BunnyCDN — ~120 PoPs, indie / cost-focused. ~$0.01/GB vs CloudFront's $0.085/GB list. Trades enterprise features and DDoS guarantees for raw cost.
  • Google Cloud CDN, Azure Front Door — hyperscaler CDNs mirroring the CloudFront-vs-AWS-origin pattern for their own clouds.

Private / purpose-built CDN

Some workloads are large enough that running your own is cheaper than buying one. The economics tip around 100+ Tbps of sustained traffic with a workload uniform enough to skip general-purpose features.

  • Netflix Open Connect — ~17,000 cache appliances embedded inside ISP networks. Each: ~250 TB SSD, 100 Gbps NICs (Network Interface Cards), optimized for sequential reads of 4K video segments. ~95% of Netflix traffic served from inside the user's own ISP — never traverses the public internet backbone. This architecture (in-ISP, not at-IX) is viable because video files are immutable and the popular set fits on disk.
  • Google's Google Global Cache (GGC) — analogous in-ISP appliances for YouTube, Drive, Play Store.
  • Meta's edge POPs — internal CDN for Facebook/Instagram static assets and WhatsApp media.
  • Apple's edge cache — iOS / macOS updates, App Store, iCloud. Multi-tier with significant Akamai overflow for surge events.

Trade-off: third-party CDNs share fixed costs but charge per-GB; private CDNs are huge capex but ~zero marginal cost. Below ~10 Tbps the third-party model wins; above ~100 Tbps the math flips. In between is mix-and-match (Apple Edge Cache + Akamai contracts).

Edge compute as a variant

Edge compute turned CDNs from byte caches into thin application platforms. Cloudflare Workers (V8 isolates, ~1-5 ms cold-start), Fastly Compute@Edge (WASM-only, ~1 ms cold-start), AWS Lambda@Edge (Node/Python/Java, ~100 ms cold-start), Akamai EdgeWorkers. These run code between cache lookup and response — JWT (JSON Web Token) validation, A/B routing, image resize, header rewriting, personalization. Not a substitute for origin durability; an extension of the cache plane with programmable behavior.

Specialized CDN variants

  • Video CDN — optimized for HLS (HTTP Live Streaming) / DASH (Dynamic Adaptive Streaming over HTTP) segment caching, multi-bitrate variants, manifest manipulation. Segments are 500 KB - 5 MB; dozens of bitrate variants per stream.
  • Gaming CDN — patch distribution dominated. AAA releases are 100 GB across 100M users in 24 hours. Steam, Epic Games, Battle.net hybrid commercial CDN + their own peer-assist layers.
  • Software-distribution CDN — Linux mirrors (Ubuntu on Cloudflare), Docker registry pull-through (Hub on Cloudflare, ECR on CloudFront), apt/yum mirrors.
  • DNS-only / steering-only CDNs — NS1, Cedexis. Don't cache HTTP; steer traffic at DNS to optimal endpoints. Often used as multi-CDN front-end.

Comparison along named dimensions:

Variant PoP topology Workload assumption Programmability Best fit
Cloudflare-style 300 mid-sized PoPs General web + APIs V8 isolates Default for most web
Akamai-style 4,200 small + edge Enterprise + streaming EdgeWorkers Big-media, gov, banking
Fastly-style 70 super-PoPs API + dynamic VCL (very flexible) High-config-velocity teams
CloudFront-style 450 + regional AWS-fronted workloads Lambda@Edge AWS-native shops
Netflix Open Connect 17k in-ISP Immutable video segments None Single workload at >100 Tbps
BunnyCDN-style 120 PoPs, low cost Small/medium static Minimal Indie / cost-driven
Specialized video Custom or overlay Adaptive segments Manifest manipulation Live + VOD streaming

Same primitive; very different tunings.


§4. Underlying data structure and storage engine

This is the section that separates Staff candidates. Every CDN runs some cache engine at every PoP, and the engine has a fairly standard shape because the access pattern is so consistent.

(a) In-RAM hash table over log-structured SSD segments

A cache node has two physical storage tiers: RAM for hot lookups, NVMe (Non-Volatile Memory Express) SSD for the bulk working set. Logically one keyed object store; physically hot in RAM, long tail on SSD.

                    Cache box (one PoP shard for hash(URL) % N)

  +----------------------------------------------------------------+
  |  In-RAM hash table                                              |
  |                                                                  |
  |   index_entry = {                                                |
  |     u64 key_hash;            // 8 bytes, partial cache key       |
  |     u32 ssd_offset;          // 4 bytes                          |
  |     u32 length;              // 4 bytes                          |
  |     u64 last_access_tick;    // for eviction policy              |
  |     u32 ttl_expiry_epoch;                                        |
  |     u8  vary_fingerprint[8]; // for variant disambiguation       |
  |   }                                                              |
  |                                                                  |
  |   chained buckets — ~3 entries per bucket at 70% load factor    |
  +----------------------------------------------------------------+
                          |
                          |  ssd_offset
                          v
  +----------------------------------------------------------------+
  |  SSD: log-structured object store                                |
  |  (Cloudflare "tincan", Fastly storage layer, CloudFront engine,  |
  |   Akamai NetStorage variants)                                    |
  |                                                                  |
  |  +---------+---------+---------+---------+---------+              |
  |  | segment | segment | segment | segment | active  | ...          |
  |  |   0     |   1     |   2     |   3     | segment |              |
  |  | (1 GB)  | (1 GB)  | (1 GB)  | (1 GB)  | (write) |              |
  |  +---------+---------+---------+---------+---------+              |
  |                                                                  |
  |  Each segment is append-only.                                    |
  |  Writes (cache fills) append to the active segment.              |
  |  When active fills (1 GB), it seals; a new active opens.         |
  |  Evictions pick a victim segment, copy still-hot survivors to    |
  |  active, then erase the victim segment (whole-segment discard    |
  |  — friendly to SSD endurance).                                   |
  +----------------------------------------------------------------+

This is fundamentally LSM-flavored (Log-Structured Merge) even though we don't call it that: writes are purely sequential, the index is in RAM (O(1) lookup), "compaction" is segment-level eviction (no sorted-run merge because there's no ordering invariant).

Approach Random write IOPS SSD endurance Lookup Used by
Per-file on filesystem ~50k IOPS, fsync per write Bad O(1) Naive nginx setups
B+ tree keyed object DB Random writes, write amplification Mediocre O(log n) RocksDB-as-cache attempts
Log-structured segments Sequential — line-rate Excellent O(1) Cloudflare, Fastly, Akamai

Log-structured is non-negotiable at scale because SSD write endurance limits sustained random writes. Sequential append at 3 GB/s per NVMe is cheap; equivalent random 4 KB writes would burn cells in months.

(b) Why this fits the access pattern

Across all CDN workloads — static web, API responses, video segments, gaming patches, software downloads, ML model artifacts — the access pattern is identical:

  • Read-dominant (~99% reads even on miss-heavy workloads).
  • Working set Zipfian / power-law — 1% of keys serve 80%+ of requests.
  • Writes bursty and externally driven (cache fills on miss).
  • No range scans or sorted iteration.

The in-RAM hash table + log-structured SSD wins: ~100 ns RAM hit, ~100 us SSD hit, sequential writes at line rate, whole-segment eviction with no fragmentation, naturally fits "hot in RAM, warm on SSD, cold to origin."

(c) Eviction — the heart of cache hit rate

Naive LRU (Least Recently Used) is wrong at CDN scale. Two problems:

  1. Scan pollution: one large crawler walks 100k cold URLs; all displace hot entries. Shows up universally — Googlebot, archive.org, security scanners.
  2. Lock contention: at 1M rps per box, the LRU list head becomes a global lock.

LFU (Least Frequently Used): counters grow stale; once-popular keys dominate forever. Rarely used pure.

ARC (Adaptive Replacement Cache): IBM's classic; two LRU lists, ghost list, adaptive ratio. Better; patent-encumbered until 2024 so not in widely used systems for years.

W-TinyLFU (Window TinyLFU): the modern answer. Used by Caffeine, Cloudflare in parts of its stack, serious caches generally.

W-TinyLFU layout:

  +--------------+   admission     +----------------------------------+
  | window LRU   | ---filter---->  | main cache (SLRU = Segmented LRU)|
  | (1% of cache)|                  |  +----------+ +-------------+    |
  |              |                  |  | protected| |  probation  |    |
  | new entries  |                  |  |   80%    | |    20%      |    |
  | get a chance |                  |  +----------+ +-------------+    |
  +--------------+                  +----------------------------------+
                                          ^
                                          |
                            +-------------+--------------+
                            | Count-Min Sketch (4-bit    |
                            | counters, 4 hash funcs)    |
                            | — tracks key frequency     |
                            |   approximately            |
                            +----------------------------+

Each access:

  1. New entry enters the window (small LRU). New entries always get a chance.
  2. When window evicts, the victim hits the admission filter: count-min sketch compares its estimated frequency to the would-be victim from main. Higher-freq wins.
  3. Main cache is SLRU (Segmented LRU): protected (80%) and probation (20%). New items enter probation; promotion to protected on second access.
  4. The count-min sketch is aged (halve all counters every N accesses) so once-hot items don't dominate forever.

Why it wins across very different workloads:

  • Static-asset CDN for an e-commerce catalog: filter rejects one-hit-wonder product variants from displacing the perpetually-hot homepage carousel.
  • API CDN (Stripe, GitHub Pages): keeps /api/v1/health and other high-frequency low-cost endpoints in cache, even when one-off /api/v1/customers/{rare_id} calls flood through.
  • Video CDN: popular shows dominate frequency; one-off rare-language subtitle requests get filtered.

Empirically W-TinyLFU achieves ~5-10 percentage points higher hit rate than LRU on the same memory budget. At 95% baseline, going to 99% halves origin load. Multi-million-dollar engineering at scale.

(d) Cache-key composition — the hidden axis

A naive cache key is (method, host, path). Real keys need more:

cache_key = SHA1(
    method
    || ":" || host
    || ":" || path
    || ":" || canonicalize_query(query_params)
    || ":" || vary_fingerprint(req_headers, response_vary)
    || ":" || ( bucket(cookies, allowed_cookies) )
)

Each component is a footgun, with consequences differing by workload:

  • Query params: should ?utm_source=twitter fragment the cache? No — strip it. Should ?w=400 (image width) fragment? Yes. The CDN config must know which params are content-bearing. A bad UTM rule can blow hit rate by 50 percentage points in an afternoon.
  • Vary headers: Vary: Accept-Encoding is fine (~3 variants). Vary: User-Agent is catastrophic (~10k variants per URL, cache collapse). API CDNs see this when Vary accidentally lists a per-client correlation header.
  • Cookies: Cookie: session=... in cache key = per-user fragments = useless. Strip irrelevant cookies before hashing; bucket meaningful ones (logged-in vs anonymous, country bucket) and key on the bucket.

Auditing cache keys is the highest-leverage optimization any CDN customer ever does — across image hosting, API gateways, news sites, anywhere.

(e) Consistent hashing for shard-by-URL within a PoP

Within a PoP with N cache boxes, each URL should go to the same box (so it's cached once, not N times). Naive hash(URL) % N fails when boxes are added/removed — all keys remap. Consistent hashing with ~150 virtual nodes per box:

0 -------------------- ring ----------------------> 2^32

[B0 vnode1][B2 vnode1][B1 vnode1][B0 vnode2][B3 vnode1][B1 vnode2]...

hash(URL) lands on the ring; first vnode clockwise wins.

If B2 dies: only B2's vnodes are removed; keys formerly on B2 vnodes
            go to the next vnode. ~1/N of keys remap; the rest are
            unaffected.

If a new box B4 is added: vnodes inserted; ~1/N of keys move to B4.

Graceful scaling. Cache misses on remap are bounded; the rest stays warm. Rendezvous hashing (HRW) is a variant — slightly more deterministic with weighted boxes, marginally more CPU. Same shape.

(f) Bloom filter for L2 existence probes

When L1 misses and considers asking L2 (mid-tier), an option is a Bloom filter of "keys that exist in L2."

  • Definitively absent → skip L2, go straight to shield/origin (save a hop).
  • Possibly present → query L2.

Useful for cold long-tail content; ~1% false-positive rate at ~10 bits per L2 key. Many CDNs skip this — L2 probe cost is small enough that the complexity isn't always worth it.

(g) Walk through one request, byte level

A user in Berlin requests https://cdn.example.com/asset.bin?v=400 over HTTPS. The walk is identical whether the asset is a JPEG, JSON API response, video segment, or ML model file — the cache plane doesn't care.

1. DNS — 8 ms
   - Client resolves cdn.example.com.
   - 1.1.1.1 returns 104.16.x.x (anycast IP) within ~10ms (often cached).

2. TCP / TLS handshake — 30 ms
   - BGP routes the SYN to Frankfurt PoP (5 ms one-way).
   - SYN/SYN-ACK/ACK ~10 ms RTT.
   - TLS 1.3 1-RTT handshake — 5 more ms.
   - Session resumption → 0-RTT, skip handshake; gain ~20 ms.

3. HTTP/2 stream open — 1 ms
   - Client opens stream 1; sends GET /asset.bin?v=400 with headers.

4. PoP receives request — 0 ms (now)
   - L4 LB (XDP/eBPF on the receiving box) inspects SNI (Server Name
     Indication) or unencrypted Host header, then hashes the URL.
   - hash(URL) -> cache box B17.
   - L4 LB forwards (single hop within PoP — sub-ms).

5. Cache box B17, edge cache layer — 0.5 ms
   - Strip irrelevant query params; keep ?v=400 (declared content-bearing).
   - Compute cache_key = SHA1("GET:cdn.example.com:/asset.bin:v=400::").
   - Lookup in in-RAM hash table.
     +-- HIT (RAM): return bytes in ~0.1 ms (memcpy + framing).
     +-- HIT (SSD): RAM index -> ssd_offset -> 4 KB sector read ~100 us;
     |              larger objects -> multiple sequential reads.
     +-- MISS: proceed to step 6.

6. On miss, ask L2 (mid-tier cache) — 2 ms
   - hash(URL) -> mid-tier shard M3.
   - Cross-PoP RPC in the same region — ~1 ms RTT.
   - HIT -> return; B17 writes object to active segment + index.
   - MISS -> proceed to step 7.

7. On L2 miss, ask shield — 5 ms (or skip if no shield)
   - Request coalescing: if another concurrent request already has
     an outstanding fetch for this key, attach to that promise.
   - Shield issues conditional GET (If-None-Match) if it has stale.

8. On shield miss, go to origin — 50–200 ms
   - Frankfurt shield to us-east-1 origin ~90 ms RTT TLS-warmed.
   - Origin responds with Cache-Control: max-age=86400, immutable.

9. Propagate down the hierarchy
   - Shield caches -> L2 caches -> L1 (B17) caches.
   - Each tier writes to its active log-structured segment and updates
     in-RAM index.
   - Index update is in-memory only; no fsync (reconstructible from
     segment headers on restart).

10. Respond to client — 1 ms
    - HTTP/2 DATA frame(s); zero-copy via sendfile() where possible.

Total cache HIT:    ~ 12-15 ms (DNS + TLS + cache read).
Total cache MISS:   ~ 100-300 ms (depending on origin distance).

The durability point is the segment write on SSD; once it returns, the cache survives a process restart. The in-RAM index is rebuilt by scanning segment headers on startup (~1 s per TB of cache). No fsync on most writes — we accept that a node crash loses a few seconds of cache fills. The cache is a cache; the origin is the source of truth. That contract repeats throughout the doc.


§5. Cache key normalization in depth

§4(d) named cache-key composition as a footgun; this section unpacks it. Cache-key hygiene is the single highest-leverage knob between you and a healthy hit rate, and it is the single most common source of CDN-induced outages. The reason: a cache key is the only thing the CDN uses to identify equivalent requests. Two URLs that should hit the same cache entry but produce different keys mean two cache misses. Two URLs that should be different but produce the same key means user A sees user B's response.

(a) URL normalization

The naive cache key uses the raw URL as the user sent it. That is wrong because URLs have multiple syntactic variants that should hash to the same bucket:

  • Case sensitivity. /Products/123 and /products/123 are usually the same resource — but the URL spec allows path case-sensitivity, so the CDN must be told. Hosts must always be lowercased; paths usually should be unless the origin explicitly differentiates.
  • Trailing slash. /about vs /about/ — typically the same page. If the CDN doesn't normalize, both variants cache independently and a popular page has two cache entries each at half the hit rate.
  • Default ports. https://example.com/path and https://example.com:443/path — same resource. Strip default ports before hashing.
  • Percent-encoding. /foo%20bar and /foo+bar and /foo bar should usually canonicalize the same way. RFC 3986 normalization (case-fold hex digits in percent-encoded sequences, decode unreserved characters).
  • Query parameter order. ?a=1&b=2 and ?b=2&a=1 are the same query. Canonicalize by sorting alphabetically before hashing. If the CDN doesn't do this, a client that orders params differently than the typical client misses cache for every request.
  • Tracking-only parameters. utm_source, utm_medium, utm_campaign, fbclid, gclid should be stripped before hashing (but preserved when forwarding to origin if origin needs them for analytics). A single bad rule that includes UTM in the cache key shatters hit rate: every link clicked from a different campaign is a new cache entry. A Twitter-driven product launch can drop hit rate from 99% to 30% in an afternoon because of this.
  • Content-bearing parameters. Conversely, ?size=large, ?w=400, ?lang=fr, ?v=42 are content-bearing and must stay in the cache key. The CDN configuration is essentially an allowlist of which params count.

A correct normalization pipeline:

raw_url
  -> lowercase host
  -> strip default port
  -> RFC 3986 path normalize (case-fold hex, decode unreserved)
  -> resolve relative segments (. and ..)
  -> trailing-slash policy (strip or canonicalize per route)
  -> parse query string
  -> drop params not in content-bearing allowlist
  -> sort remaining params alphabetically by name
  -> stable-encode values
  -> emit canonical_url
  -> feed canonical_url into the cache_key hash

This is the kind of code that should be one function, audited, tested, and shared across every PoP. When it diverges across tiers (L1 normalizes one way, L2 another), you get keys that hit at one layer and miss at the next — cache hierarchy effectively disabled.

(b) Vary header handling

The Vary response header tells the cache: "this response varies based on request header X; key on X." It is correct in principle, catastrophic in practice if used carelessly.

  • Vary: Accept-Encoding is fine. ~3 values in the wild (gzip, br, identity). Three cache variants per URL is acceptable.
  • Vary: Accept-Language is borderline. Tens of values; if the origin actually serves localized content this is the price of correctness. Better: bucket Accept-Language to a small set (en, es, fr, de, ja, "other") in CDN config and key on the bucket.
  • Vary: User-Agent is catastrophic. Tens of thousands of UA strings per major site. Each variant means an independent cache entry. A URL with Vary: User-Agent has a cache hit rate approaching zero. The origin developer who added it probably wanted to serve different HTML to mobile vs desktop and reached for the wrong tool. The right tool is a UA classifier at edge (Workers / VCL) that buckets UAs and uses a synthetic header in the cache key.
  • Vary: Cookie is even worse if the cookie includes session ID. Every user gets their own cache entry. The whole CDN devolves into a per-user store with no hits.
  • Vary: * (literally an asterisk) means "this response is unique per request; never cache." Servers emit this sometimes; CDNs honor it by bypassing cache entirely.

The right operational discipline: the CDN owner reviews every Vary header coming from origin and either restricts it via config (force the list to known-safe values) or works with the origin team to remove it. Vary should be a deliberate decision, never accidental.

Internally, the cache implements Vary as a two-level lookup:

1. URL-canonical key  -> "variant map" entry
   Variant map records which request headers Vary on (per response)
   plus a list of stored variants, each tagged by a fingerprint of
   the relevant request header values.

2. For each request, compute the variant fingerprint from the
   listed Vary headers, then look up the matching variant.

Lookup is still O(1) on a hit but two hash steps. The variant map
itself is small; the cost is the variant fragmentation when Vary is
on a high-cardinality header.

Cookies are the most dangerous input to a cache key because they often carry session identity. Three policies:

  1. Strip all cookies before key hash. Default for purely static content. The origin response should set Cache-Control: public and contain no per-user data; the CDN doesn't include cookies in the key.
  2. Bypass cache when sensitive cookies are present. For mixed pages (logged-in and anonymous served by the same URL), CDN config: "if Cookie: session=... is set, set cache mode to bypass; serve origin directly without caching the response." This sacrifices hit rate on logged-in traffic but guarantees no cross-user contamination.
  3. Bucket cookies and key on the bucket. When you want logged-in vs anonymous caching separately: extract whether the user is authenticated (presence of session cookie → "logged_in_v1" else "anon_v1"); use that string in the cache key, not the cookie value. Each variant caches once.

Bad configurations: - Including Cookie: header verbatim in the cache key. Every session is a unique key — useless and PII-leaking simultaneously. - Stripping the cookie from the cache key but not from the response variant — the cached response contains user A's data; user B's request matches the key; user B gets user A's data. The "we accidentally cached PII" disaster.

The classic incident pattern: a developer adds personalization to a page (small per-user widget). They forget to add Cache-Control: private or no-store. The CDN's default is "cache everything." The CDN does not include the auth cookie in the cache key (correct — would shatter hit rate). Now every request to that URL gets the response cached for the first user who hit it. If that first user was logged in, every subsequent user — logged in or not — sees that user's name, balance, possibly more. Multiple major sites have had this exact incident publicly; it's the canonical CDN configuration disaster.

Mitigations: - Default-deny on dynamic responses: only paths or content types explicitly marked cacheable get cached at all. - Origin must always set Cache-Control: private or no-store on per-user responses. - WAF rules: if response body contains common per-user markers (Set-Cookie, user-specific HTML patterns) and the request had no auth, flag as suspicious and refuse to cache. - Synthetic monitoring that pulls "anonymous" and "logged-in" variants and diff-checks them.

(d) Cache key explosion

The opposite failure mode of cache poisoning is cache key explosion — too many variants, no hits. Symptoms: hit rate collapses below 50% for no obvious reason; cache fill rate spikes (every request causes a fill); origin load climbs.

Common causes: - Vary on high-cardinality header (User-Agent). - Tracking params in cache key. - Cookie included. - Request-ID or correlation-ID accidentally hashed. - Random query params from broken clients (Chrome's ?_=12345 cache busters, ad SDK ?ab_segment=...). - The URL itself contains a per-request token (signed URLs with unique signature per call).

Detection: dashboards on cache-key-to-URL ratio per host. If a host has 10x more cache keys than distinct URLs, something is fragmenting the key. Per-host hit-rate trend dashboards catch this in hours rather than days.

The "every user has unique cache key = no hits" pathology is, statistically, the most common single cause of bad CDN bills. Every miss costs origin egress + compute; a 99% hit rate site dropping to 50% misses is doing ~50x more origin work, often with no other visible change.


§6. Cache invalidation protocols

CDNs cache aggressively; the harder problem is making them stop. Invalidation lives on a spectrum from "expensive but immediate" to "free but slow."

(a) Purge by URL — the atomic primitive

The simplest API: PURGE https://cdn.example.com/path/asset.json against the CDN's control plane. Goes through:

  1. Customer API call (or origin webhook).
  2. Control-plane validation (auth, rate limits, surrogate key resolution).
  3. Fanout to regions, then PoPs, then individual cache boxes.
  4. Each box removes the entry from its in-RAM index. The on-disk segment isn't rewritten — the entry is just unreferenced and eventually GC'd (garbage collected) with the segment.

End-to-end latency targets: - Cloudflare: <150 ms p99 globally to all PoPs (via Quicksilver, see (e)). - Fastly: <150 ms p99 worldwide ("instant purge"). - Akamai: ~5-10 s for legacy CCU (Content Control Utility); ~150 ms for newer Fast Purge. - CloudFront: minutes (because their invalidation propagates through their less-aggressive control plane).

Cost model: most CDNs include 1000-10000 free purges per day; bulk purges are metered. Purge volume is a knob — overuse signals architectural smell (you should be versioning URLs instead).

(b) Surrogate keys — tagged invalidation

Purging by URL doesn't scale when "all 50,000 product pages for category electronics" must invalidate together. Surrogate keys solve this: the origin tags each response with one or more keys; the customer purges by tag; the CDN's tag index iterates affected URLs.

Origin response:
   GET /product/123
   Cache-Tag: product:123 category:electronics brand:apple
   Cache-Control: public, max-age=300

CDN indexes:
   product:123       -> { "/product/123" }
   category:electronics -> { "/product/123", "/product/456", ... 50,000 entries }
   brand:apple       -> { "/product/123", "/product/789", ... 1200 entries }

Customer purges:
   PURGE-TAG category:electronics
   -> CDN walks the index, purges all 50,000 URLs

   PURGE-TAG product:123
   -> CDN purges just /product/123 plus any other URLs that referenced it

Cloudflare calls these "Cache-Tag" headers; Fastly calls them "Surrogate-Key" (per RFC 5861 conventions). Akamai supports them as "Edge-Cache-Tag." The pattern is the same; the inverted index per PoP (or globally) is the implementation detail.

Operationally, surrogate keys are how every commerce site handles category updates, every CMS handles tag/topic changes, every API handles "this customer's data changed across N endpoints." Without surrogate keys you would need to enumerate every URL touched by a logical entity update — impossible for anything dynamic.

Implementation cost: the tag index. ~10 bytes per (tag, URL) pair. A site with 10M URLs, 5 tags average, costs ~500 MB of index per PoP — non-trivial but manageable.

(c) Purge by prefix — the bulk hammer

Sometimes you need "everything under /blog/." Cloudflare supports prefix purges on Enterprise plans. Fastly supports glob-style purge. The implementation is a range walk over the prefix in the URL-keyed index.

Caveat: prefix purges are expensive because they don't have a precomputed index. The CDN walks all URLs at every PoP. For sites with millions of URLs under a prefix this is minutes-to-hours; surrogate keys are usually better.

(d) Versioned URLs — the "purge by deploy" trick

The most important pattern, mentioned in §15's Problem 1, deserves its own treatment.

Bad pattern (TTL + purge):
   /static/app.js                  (TTL 1 hour, requires purge on deploy)
   Deploy new version -> purge /static/app.js across 300 PoPs
   Race: some users get old version for up to 150 ms after deploy
   Race: misconfigured PoP serves old version indefinitely

Good pattern (immutable + versioned):
   /static/app.abc123.js           (TTL 1 year, immutable)
   /static/app.def456.js           (next deploy)
   No purge ever required. The HTML simply references the new URL.

The bytes never change. The URL changes per build (typically a content hash). Old URLs continue to be served from cache (for any clients that still reference them — old HTML cached at edge or in browsers). New URLs are referenced by the new HTML, are uncached at first request, fill on demand, and cache forever.

Why this is the right answer: - No coordination problem. Deploy is atomic per HTML reference. - No purge fanout. Saves the control-plane bandwidth. - No race condition. A given URL refers to one set of bytes forever. - Far longer TTLs are safe. max-age=31536000, immutable (1 year, immutable) is fine. - Browser caches benefit identically. Round trip eliminated entirely.

Cache-Control: immutable (RFC 8246) tells the browser: "even on user-initiated refresh, don't revalidate this." Without immutable, a hard refresh re-fetches every asset; with it, only the HTML refetches.

The versioned-URL pattern applies to almost all static assets: JS, CSS, images, fonts, video segments. It does not apply to HTML itself (URL must remain /about) — for HTML you rely on short TTL + purge. The split is: short-lived HTML with surrogate-key purges; long-lived assets with content-hashed immutable URLs.

(e) Cloudflare Quicksilver — the propagation network

The mechanism behind <150 ms global purge needs its own description because most candidates have never thought through what it takes.

Quicksilver is Cloudflare's globally-distributed key-value store specifically for config and purge propagation. ~hundreds of GB of state per PoP, replicated to every PoP within ~150 ms p99.

Mechanism: - Writes go to a small leader cluster. - Leaders fanout to regional sub-leaders. - Sub-leaders fanout to all PoPs in the region in parallel. - Each PoP applies the write to local state; subscribers (the cache process, the WAF process) receive a notification.

The protocol is fundamentally gossip-with-leader: writes are linearizable to the leader, eventually consistent at the PoPs, with a tight p99 propagation SLO measured continuously. Failures are tolerated by the gossip retrying through alternate paths. Faulty PoPs are detected by heartbeat lag and depooled.

Fastly's equivalent ("Service Configuration Distribution," sometimes called SCD) is architecturally similar with somewhat different leader topology. CloudFront's invalidation is slower partly because it's a different beast — they invalidate via the CloudFront management plane and don't have a comparable purpose-built propagation network.

(f) Cache-Control: immutable and stale-while-revalidate

The HTTP standards include first-class support for the patterns above:

  • Cache-Control: max-age=N — TTL in seconds.
  • Cache-Control: s-maxage=N — TTL for shared caches (CDN); browser uses max-age.
  • Cache-Control: public — explicitly cacheable by shared caches.
  • Cache-Control: private — only browser may cache; CDN must not.
  • Cache-Control: no-store — must not be cached.
  • Cache-Control: no-cache — may cache but must revalidate before reuse.
  • Cache-Control: must-revalidate — when stale, revalidate; do not serve stale.
  • Cache-Control: immutable — bytes will not change for this URL; skip revalidation.
  • Cache-Control: stale-while-revalidate=N — for N seconds past TTL, serve stale and refresh in background.
  • Cache-Control: stale-if-error=N — for N seconds past TTL, serve stale if origin is unreachable.

The hidden gem is stale-while-revalidate: serves a stale response immediately while triggering a background refresh. Reduces tail latency to zero on revalidation. Combined with surrogate-key purges it gives "always-fresh-but-fast" behavior.

stale-if-error is a resilience primitive: when origin is unreachable (504), serve stale rather than 503. Bounded staleness is better than an outage for most workloads.


§7. Capacity envelope

A few real-world anchors at different scales:

  • Cloudflare: ~280 Tbps network capacity. ~300 PoPs across 120+ countries. Sustained ~50M rps; peak crossing 1 billion req/min during DDoS events.
  • Akamai: ~4,200 PoPs across 1,000+ networks. Served 250+ Tbps during major NFL events. ~30% of web traffic.
  • Netflix Open Connect: ~17,000 in-ISP appliances. ~200 Tbps peak during popular releases.
  • Fastly: ~70 superPOPs, each with petabyte-class storage.
  • CloudFront: ~450 edge locations + ~13 regional edge caches. Theoretical 600+ Tbps.
  • BunnyCDN: ~120 PoPs, ~$0.01/GB pricing. Limited DDoS protection.

Where the next bottleneck appears as you scale traffic through the same architecture:

Edge RPS              10M rps     ← physical: ~30 Tbps wire, ~3000 cache boxes
TLS handshakes / sec  100k/sec    ← ~30k CPU cores doing TLS, mitigated by
                                    session resumption (>90% of conns resume)
Origin RPS            500k rps    ← scary; needs origin shielding
DNS QPS (queries/sec) ~2M qps     ← anycast DNS resolvers absorb; not bottleneck
Purge fanout          ~100/sec    ← could be 1000/sec at peak, must propagate
                                    to all PoPs in <150 ms

Bottleneck migration: at small scale (one PoP, ~10k rps) it's per-server CPU for TLS. At medium scale (10s of PoPs, ~100k rps) it's cross-PoP bandwidth and L2 hit rate. At large scale (~10M rps) it's BGP convergence and anycast pathology; control-plane gossip propagation dominates.

The hit rate is THE metric at every tier. 95% at 10M rps means 500k rps to origin — likely a meltdown without shielding. 99% means 100k rps — trivially serviceable. 99.9% means 10k rps — boring. Every architectural choice in §4 traces to "what keeps hit rate >95% in steady state and >99% with shielding."

For private CDNs like Netflix Open Connect, the math is different: video files are immutable and the popular subset fits on disk. Hit rate is ~99%+ trivially because the workload has been engineered around the cache.


§8. Video CDN segment caching

Video is the largest single workload class on the planet's CDNs (Netflix + YouTube + Twitch + TikTok + the rest of streaming combined are >70% of total internet bytes). The protocols and caching patterns are different from general web caching enough to justify their own section.

(a) HLS — HTTP Live Streaming (Apple)

Introduced by Apple in 2009; the de facto standard for iOS, macOS, Safari, and a major chunk of the rest. The trick: instead of streaming a single video file over a custom protocol, the video is chopped into ~2-10 s segments, each segment is a separate HTTP-accessible object, and a manifest file (.m3u8) lists the segments in order.

manifest.m3u8 (master playlist — multi-bitrate)
   #EXTM3U
   #EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360
   low/index.m3u8
   #EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
   med/index.m3u8
   #EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=1920x1080
   high/index.m3u8

high/index.m3u8 (media playlist — segment list for one bitrate)
   #EXTM3U
   #EXT-X-TARGETDURATION:6
   #EXT-X-VERSION:3
   #EXT-X-MEDIA-SEQUENCE:0
   #EXTINF:6.0,
   segment-0000.ts
   #EXTINF:6.0,
   segment-0001.ts
   ...
   #EXT-X-ENDLIST

The client polls the manifest, downloads segments sequentially, and switches bitrate variants based on network speed (ABR — Adaptive Bitrate). Each segment is typically 500 KB - 5 MB.

Why this is perfect for HTTP caching:

  • Each segment is an immutable HTTP object. URL never changes. Cache-Control: max-age=31536000, immutable.
  • Segments are small enough to fit in cache memory or NVMe trivially; many can stream in parallel.
  • Popular video → segments hit a CDN cache box; one cache fill serves millions of viewers.
  • The 95-99% hit-rate target is achievable because the popular catalog is small relative to the cache (the long tail of obscure videos misses cache, but they aren't the byte-volume driver).

The manifest itself is the only thing that's not trivially cacheable: for VOD (Video On Demand) the manifest is immutable too (entire segment list fixed); for live the manifest is updated every few seconds with new segments appended (and old ones aging out for sliding-window live).

(b) DASH — Dynamic Adaptive Streaming over HTTP

MPEG-DASH, ISO/IEC 23009-1. Conceptually identical to HLS — manifest plus segments — but uses XML-based .mpd manifest instead of .m3u8, supports a wider range of codecs and container formats (segments may be .mp4 instead of .ts), and has more extensible metadata. Used heavily outside the Apple ecosystem: YouTube, Netflix (with internal extensions), most Android players, Smart TVs.

From a CDN cache plane perspective: identical to HLS. Manifests are small files, segments are independent HTTP objects keyed on URL. The same segment-caching mechanism works for both.

Many CDNs serve both formats from the same origin — origin packages once, distributes both manifest formats and a shared segment store. CMAF (Common Media Application Format) is the standard for this: a single segment file (fMP4 fragmented MP4) usable by both HLS and DASH players.

(c) Manifest file cache strategy

Manifests are tiny (a few KB) but request volume is huge — every viewer pulls one per session, plus refreshes for live. The strategy diverges by mode:

  • VOD manifests: immutable for the life of the video. Cache-Control: max-age=86400 is fine. Hit rate near 100%. Effectively static assets.
  • Live manifests: change every ~1-10 s as new segments are added. Cache-Control: max-age=1 or even no-cache. The CDN caches for one second; on each tick, all PoPs simultaneously refetch. With request collapsing + tight TTL jitter this is fine — the origin sees ~1 manifest request per PoP per second.
  • Low-latency HLS (LL-HLS — Low-Latency HTTP Live Streaming) and Low-Latency DASH (LL-DASH) are extensions that reduce live latency from ~30 s to ~3 s by using chunked transfer encoding within a segment. The CDN must handle partial segment caching — caching the first 500 ms of a 2 s segment while the next 1.5 s is still being produced. Implementation is a streaming proxy + cache fill in one pass.

Twitch, YouTube Live, and the major sports streamers all use LL-HLS or LL-DASH in production now. The "we shaved 25 s off live latency" engineering effort across the industry happened 2020-2024.

(d) Live vs VOD as different cache shapes

Dimension VOD Live
Manifest mutability Immutable for video lifetime Mutates every 1-10 s
Segment mutability Immutable forever Immutable once produced
Concurrent viewers Spread across content Often concentrated on one stream
Per-PoP working set "Popular catalog" "Currently airing event"
Cold start Long tail, gradual Bursty (event starts at announced T0)
TTL strategy max-age=year + immutable max-age=segment-duration
Most common failure Long-tail miss to origin Manifest stampede at segment boundary

For live sports / major events, the failure mode is manifest stampede: 10M viewers all hitting the manifest at the same 1 s boundary. Per-PoP coalescing reduces this to 1 fetch per PoP per second; per-region shield further reduces to 1 per region per second; origin sees a few requests per second instead of millions.

(e) Netflix Open Connect — the full-fat approach

Netflix's design is qualitatively different from third-party video CDNs:

  • ~17,000 cache appliances deployed inside ISP networks (rather than at internet exchange points).
  • Each appliance: ~250 TB of SSD, 100 Gbps NICs, single-purpose (video only).
  • Pre-fill at off-peak hours. Netflix knows what's popular tomorrow (releases, recommendations). It pushes the popular set to every appliance overnight. Daytime traffic is almost all from local SSD — never traverses the internet backbone.
  • Hit rate is effectively 100% on popular content (since it's pre-fetched). Long-tail content backs up to a fill chain: appliance → regional Netflix POP → origin.
  • Each ISP gets free hardware (Netflix ships the appliances) in exchange for hosting and providing power + bandwidth within the ISP.
  • The aggregate is the world's largest CDN by bytes-moved-per-day, but it serves only Netflix. Single-customer, single-workload, in-ISP, immutable content, predictable demand.

This works because: - Video files are immutable (no purge problem). - The popular subset is small relative to disk (the top 5000 titles is ~petabyte; an appliance holds the relevant fraction). - Demand is predictable (recommendation engine knows tomorrow's popular content tonight). - Single workload = no need for general programmability, WAF, edge compute. Netflix optimizes the stack end to end.

The architectural lesson: when your workload is uniform enough, building your own beats buying one. Same lesson holds for Google's GGC (Google Global Cache), Meta's edge POPs, Apple's edge cache. The economic crossover is around 100 Tbps sustained; Netflix is 5-10x past that.

(f) DRM and signed segments

Premium video adds DRM (Digital Rights Management): segments are encrypted; clients fetch decryption keys separately. The encrypted segments are cacheable normally (the encrypted bytes are the same for everyone with that license); the key requests are per-user and uncacheable. CDN serves segments; a separate license server handles key issuance.

Signed URLs (HMAC or RSA signatures embedded in the URL with expiry timestamps) are another premium-video pattern. The cache key strips the signature (otherwise every signed URL is unique). The CDN validates the signature at the edge — Workers / VCL — and serves from cache on valid signatures, rejects on invalid or expired.


§9. Image optimization at edge

Images are 50-70% of typical web page bytes. Edge image optimization — converting, resizing, recompressing images at the PoP — is one of the highest-ROI features modern CDNs offer.

(a) Format conversion (JPEG to WebP to AVIF)

Image formats have evolved with substantial compression gains at each step:

Format Year Typical compression vs JPEG Browser support
JPEG 1992 baseline Universal
PNG 1996 lossless; large for photos Universal
WebP 2010 ~25-35% smaller All modern browsers
AVIF 2019 ~50% smaller Most modern; not Safari < 16
JPEG XL 2021 ~40-60% smaller Limited (Chrome dropped, Safari adopting)

Naive serving: origin stores JPEG; everyone gets JPEG. Modern serving: origin stores one source image; CDN reads Accept header from client, converts on the fly, caches the converted variant.

Implementation at edge: - Cloudflare Polish / Cloudflare Image Resizing: feature flag turns on; origin remains JPEG; CDN serves WebP/AVIF to capable browsers. - Fastly Image Optimizer: similar, configurable via VCL. - AWS CloudFront via Lambda@Edge functions: write a Lambda that calls a Sharp/Pillow library on the origin image and returns the converted variant. - ImageKit, imgproxy, Imgix: third-party SaaS that lives between origin and CDN, or runs as origin.

Cache key includes a format token (format=webp) so each format is cached separately. Negotiation pattern:

1. Browser sends GET /photo.jpg with Accept: image/avif, image/webp, image/*
2. CDN edge inspects Accept, picks best format the browser supports.
3. CDN cache key includes the chosen format -> cache lookup.
4. On miss: CDN fetches origin's JPEG (cached separately under a "source" key),
            converts to AVIF, caches the AVIF variant.
5. Subsequent requests with same Accept profile hit the AVIF variant directly.

This effectively triples the cache entries per image (JPEG + WebP + AVIF) but each is still cacheable forever (immutable per source image hash).

(b) Automatic resizing

A single source image can serve dozens of sizes: - Mobile thumbnail: 150x150 - Tablet thumbnail: 300x300 - Article header: 1200x630 - Hero image: 2400x1200 - Retina variants (2x, 3x)

Without edge resizing, origin must store every variant (storage explosion) or the page loads the largest variant and the browser shrinks it (bandwidth waste). Edge resizing: origin stores one master image; the CDN URL includes resize parameters (?w=1200&h=630&fit=cover); cache key includes the parameters; each variant caches independently.

Cloudflare Image Resizing URL pattern: /cdn-cgi/image/width=400,quality=85/path/to/source.jpg. Fastly: query parameters or VCL-injected. AWS: Lambda@Edge with parameter parsing.

Storage on origin drops 10x or more. Hit rate stays high because the actual variants users request is a small set (a dozen sizes covers 95% of requests).

(c) Quality reduction based on connection speed

Clients send Save-Data: on (Chrome) or ECT (Effective Connection Type — slow-2g, 2g, 3g, 4g) in client hints. Edge code adjusts quality:

if (request.headers.get("Save-Data") === "on") {
  return convertImage({ format: "webp", quality: 50, width: limit });
} else {
  return convertImage({ format: "avif", quality: 80 });
}

The cache key extends to include a "connection quality bucket" so each bucket caches separately. Most CDN image services do this implicitly.

(d) Economics

The pitch: "save 50% of image bytes at the edge." Concrete math:

  • Site serves 100 TB/month of images at JPEG quality 80.
  • Enable edge optimization: format negotiation (AVIF for 70% of traffic, WebP for 25%, JPEG for 5%).
  • New byte volume: 70% × 50% + 25% × 70% + 5% × 100% = 35% + 17.5% + 5% = ~57.5% of original = ~57 TB/month.
  • Saved 43 TB/month of egress.
  • At CloudFront's $0.085/GB egress: 43 TB × 1024 GB/TB × $0.085 = ~$3,742/month savings.
  • Cloudflare Image Resizing adds ~$1/million requests = a few hundred dollars/month for that traffic profile.
  • Net: thousands per month of egress saved against hundreds of edge compute cost.

For sites with 1 PB/month of images (large publishers, Shopify storefronts at scale), the savings are tens of thousands per month. The feature pays for itself in days.

Bonus: smaller images load faster, improving Core Web Vitals (LCP — Largest Contentful Paint, in particular) — directly impacting SEO ranking and conversion.

(e) The cache plane sees a Cartesian product

The downside: cache entries explode. One source image now has format × size × quality × connection-bucket variants. The CDN provider's storage cost goes up, and they pass it through (per-image-transform fees). The arithmetic still favors transformation, but it's not free.

Mitigations: - Limit configurable sizes via allowlist (only the 12 sizes the design system uses; not arbitrary ?w=N). - Use content-hashed URLs to make the variant set explicit per image. - Monitor variant count per source; alert on runaways.


§10. Edge compute

A vanilla CDN serves bytes. Edge compute lets you run code in the cache plane — between request arrival and response, or instead of a cache lookup. This evolved from CDN providers realizing that "auth check before serving cache" or "JSON transform on the response" were obvious extensions of their cache box.

(a) Cloudflare Workers — V8 isolates

Workers run JavaScript / WebAssembly in V8 isolates (the same engine as Chrome's tab isolation). One isolate per Worker per request, but shared V8 process with thousands of other workers — startup is just allocating an isolate, ~1-5 ms cold start typically. Memory ~128 MB per request, CPU ~30-50 ms per request on the "Bundled" plan (more on "Unbound").

The key architectural choice: V8 isolates instead of containers/VMs. A Docker container starts in hundreds of milliseconds. A V8 isolate starts in single-digit milliseconds because it shares process memory with the runtime; the isolation is at the language-VM level. This gives Cloudflare the ability to run code on the request path without latency penalty.

Tradeoff: no native binaries, no syscalls, restricted I/O. You can fetch over HTTP, you can call KV / R2 / Durable Objects, you can do CPU work. You can't bind a socket, mmap a file, fork a process. The sandbox is JavaScript-language-level.

(b) Fastly Compute@Edge — WebAssembly, zero cold start

Fastly compiles WebAssembly (compiled from Rust, Go, AssemblyScript, etc.) ahead of time and runs it in their Lucet/Wasmtime runtime. Cold start is sub-millisecond because the WebAssembly module is mmap'd from disk on first request and the JIT'd code is shared across instances.

The pitch: lower-level language choice (Rust performance), no cold start at all. Tradeoff: smaller ecosystem than JavaScript, you have to compile + deploy. Best fit for performance-sensitive transforms.

(c) AWS Lambda@Edge — full Lambda, slow cold start

CloudFront's edge compute. Runs Node.js, Python, Java, etc. — a full Lambda runtime. Cold start: ~100-500 ms (multi-hundred-ms even at best). Memory and CPU much larger than Workers.

The cold start matters: a CloudFront viewer-request Lambda@Edge adds 100+ ms to the first request after a deploy or scaling event. That's catastrophic for latency-sensitive paths. Mitigated by: - Caching the Lambda result at edge so subsequent requests skip the Lambda entirely. - Using CloudFront Functions instead — a much more restricted runtime (JavaScript only, very limited APIs, ~5 ms cold start) suitable for header rewriting and simple URL transforms.

The split: Lambda@Edge for heavyweight logic that's OK with cold-start; CloudFront Functions for fast path.

(d) What you can do at edge

  • Authentication / authorization. Validate JWT tokens, check session cookies, reject unauthorized requests at edge without burdening origin. A JWT validation takes ~100 us; even a busy edge node can do millions per second.
  • A/B routing. "If user is in cohort B, route to a different origin / set a different cache key." Cohort assignment in deterministic JS code (hash the user ID, threshold).
  • Personalization buckets. Add a header like X-User-Bucket: logged-in-en-US based on cookies / geo; use that header in the cache key. Each bucket caches once.
  • Header rewriting. Add security headers (CSP — Content Security Policy, HSTS — HTTP Strict Transport Security, X-Frame-Options), remove leaking server headers, normalize between origin variants.
  • URL rewriting. Redirect old URLs to new, rewrite paths for backend routing, implement clean URLs in front of legacy origins.
  • Bot detection. TLS fingerprinting (JA3 / JA4), behavior analysis, rate limiting per fingerprint.
  • Origin selection. Multiple origins for the same hostname; edge picks based on user geo, A/B cohort, origin health.
  • Lightweight personalization rendering. Serve a cached HTML skeleton; the worker inlines per-user data from a KV lookup (e.g., username). Cache hit + KV read + template fill in ~5 ms total.
  • Image / asset transforms (see §9).
  • GraphQL persisted queries. Cache GraphQL by query hash even though the request body varies.

(e) What you can't do at edge

  • Heavy compute. Edge runtimes typically cap CPU per request at 30-50 ms. Anything ML-inference-shaped (transformer forward pass on a large model) doesn't fit. Smaller models (a few MB of weights for tabular models, simple embedding lookups) may fit.
  • Large state. Worker KV is eventually-consistent and best for config / feature flags. Cloudflare Durable Objects give you stateful actors at edge but with a global routing layer (Durable Object lookups go to the worker hosting the object, not the local PoP). State that doesn't fit "shared eventually consistent or a single Durable Object" doesn't belong at edge.
  • Long-running connections. Some platforms support WebSockets at edge (Cloudflare does); long-lived connections may pin to a single PoP and have failover quirks.
  • Direct database access to typical relational databases. Edge runtimes don't have efficient connection pools for traditional databases — every cold start would need a new connection. Workarounds: HTTP API in front of database (Hyperdrive on Cloudflare), or compute moves to origin.
  • Anything requiring source-of-truth durability. Edge is a cache plane; durability is at origin or in cloud object storage.

(f) The "edge compute is the new server" overclaim

Around 2021-2023 there was hype that edge compute would replace traditional regional servers. The reality settled at: edge compute is a complement to origin compute, not a replacement. The right partition:

  • Edge: caching, auth, routing, simple transforms, personalization buckets, fast path.
  • Origin (regional): business logic, database access, expensive compute, source of truth.

The vast majority of "edge functions" deployed in production today are <100 lines of JavaScript doing routing, auth, or header logic. Heavy lifting stays at origin.


§11. Architecture in context

Canonical end-to-end pattern, vendor-agnostic:

                                     CDN integration pattern
                                     -----------------------

  Client (somewhere on the planet)
     |
     |  1. DNS resolve "cdn.example.com"
     v
  +-------------------------------------------------------+
  |  Recursive resolver                                    |
  |  asks authoritative DNS                                |
  +-------------------------------------------------------+
     |
     |  2. CDN's authoritative DNS — GeoDNS or anycast IP
     v
  +-------------------------------------------------------+
  |  Authoritative DNS (anycast)                           |
  |  - Returns CDN anycast IP, OR                          |
  |  - Returns region-specific IP (older models)           |
  |  - EDNS Client Subnet for resolver-vs-client locality  |
  +-------------------------------------------------------+
     |
     |  3. TCP/QUIC handshake to anycast IP. BGP picks the
     |     topologically nearest PoP.
     v
  +-------------------------------------------------------+
  |  PoP — nearest                                         |
  |                                                        |
  |  +------------------+                                  |
  |  | L4 LB (XDP/eBPF) |  ← consistent-hash by URL        |
  |  +--------+---------+                                  |
  |           |                                            |
  |  +--------v---------+                                  |
  |  | TLS terminator   |  ← shares session ticket key     |
  |  |                  |    with sibling PoPs              |
  |  +--------+---------+                                  |
  |           |                                            |
  |  +--------v---------+    cache_key =                   |
  |  | Edge cache (L1)  |      hash(method || host || path |
  |  |  RAM + NVMe      |           || vary headers || ...)|
  |  |  hit -> return   |                                  |
  |  |  miss -> ↓       |                                  |
  |  +--------+---------+                                  |
  |           |                                            |
  |  +--------v---------+                                  |
  |  | Mid-tier (L2)    |  ← regional cache shared across  |
  |  |  larger SSD pool |    PoPs in a region              |
  |  +--------+---------+                                  |
  |           |                                            |
  +-----------+--------------------------------------------+
              |
              |  Cross-region request collapsing — multiple
              |  PoPs converge on the same shield instance
              v
        +-----------------------+
        |  Origin Shield         |  ← single regional bastion;
        |  per continent or      |    deduplicates the entire
        |  globally              |    region's misses into one
        |  - request coalescing  |    origin fetch
        |  - cache-control       |
        |    honored             |
        +-----------+-----------+
                    |
                    |  Single fetch per key per TTL window
                    v
        +-----------------------+
        |  Origin                |
        |  (S3, ALB, your boxes) |
        +-----------------------+

  Annotations:
    - L1 shard key: hash(URL) % N_local_boxes
    - L2 shard key: hash(URL) % N_regional_boxes
    - Shield shard key: hash(URL) % N_shield_instances
    - Same hash function across tiers -> tiered hit rate compounds.
    - Purge propagation: pub/sub on a control plane bus, fanned out
      to all PoPs from origin -> mid-tier -> edge.

Specifics this diagram hides:

  • DNS resolution itself is anycast to the CDN's authoritative DNS. <10 ms anywhere on the planet because DNS responders live at the PoPs.
  • EDNS Client Subnet (ECS) matters: lets the authoritative DNS see the client subnet, so the right PoP is returned even when the resolver is far away. Without ECS, a user in Germany using a US resolver gets routed to a US PoP.
  • Anycast routing is BGP-driven — topological not geographic. Usually correlates. Anycast islands (misconfigured ISP sending traffic to the wrong PoP) are a real failure mode.
  • Mid-tier and origin shield are the most underrated components. Without them, every L1 miss goes straight to origin. With them, all PoPs in a region funnel through a small set of shield nodes — origin sees N_regions of misses, not N_pops × N_boxes_per_pop.
  • Edge compute lives between L4 LB and cache lookup. Workers / Compute@Edge / Lambda@Edge handlers run on the cache box itself.

§12. Anycast pathology in detail

§2 introduced anycast as "the client's SYN lands on the nearest PoP within ~1 RTT." That's the marketing version. The engineering reality has corner cases that every CDN team eventually trips on.

(a) Route changes mid-TCP-flow

The fundamental problem: TCP is a flow with state distributed between client and server. Anycast routes are per-packet, decided by BGP at each hop. If a route changes mid-flow — say, a transit provider's session flaps and BGP reconverges — packets from the same TCP connection start arriving at a different PoP. The new PoP has never seen this connection; it has no TCP state for it; it sends a RST (reset). The user's connection breaks.

This is rare in practice — BGP routes are usually stable for hours to days. But during BGP convergence events (large peering changes, transit outages, occasional internet weather), tens of thousands of in-flight connections can break simultaneously. The user sees "connection reset"; some retry seamlessly, some give up.

(b) The "stable routing within a flow" requirement

CDN engineering solves this by enforcing flow stability at the L4 level inside the PoP, combined with router-level flow stability:

  • ECMP (Equal-Cost Multi-Path) hash stability. Within a PoP, multiple cache boxes share the anycast IP via ECMP. The router picks a box based on a hash of (src_ip, src_port, dst_ip, dst_port). Critical: the hash must be stable for the duration of any flow. If a box is added/removed, the hash function must remap as few flows as possible (consistent ECMP). Naïve modulo hashing fails — adding one box remaps every flow.
  • Router-level flow tracking. Modern routers track flow → next-hop mappings and keep them stable even as the path's underlying ECMP set changes.
  • Cross-PoP failover requires explicit handling. If a PoP fails entirely, the BGP route withdraws and traffic flows to another PoP. Connections to the failed PoP are lost; clients retry; new connections land on the new PoP. No magic continuation — TCP doesn't survive endpoint failure.

(c) Flow-aware anycast — Cloudflare's Unimog

Cloudflare published their L4 load balancer "Unimog" in 2020. The key innovation: maintain a distributed flow table across the cache boxes in a PoP, replicated via Cloudflare's internal pub-sub. When a cache box receives a packet for a flow it doesn't own, it forwards (using ECMP) to the box that does. New flows are assigned to boxes via consistent hashing.

Effect: even if ECMP routes a packet to "the wrong" box (because the box set changed), the box detects this and forwards. Flow continuity is preserved.

Other CDNs have analogous designs (Maglev at Google was the public predecessor). The pattern is L4 load balancer with distributed flow tables, not raw ECMP.

(d) Anycast islands

Sometimes a single ISP's BGP path selection sends all anycast traffic to the wrong PoP for hours. The CDN sees: "users in this ASN have high latency, suggest they're routing to a PoP 5000 miles away instead of the one across town." Causes:

  • The ISP's transit provider has a bad path for the CDN's anycast IPs.
  • A misconfigured BGP filter at an exchange point drops the CDN's announcement.
  • A flapping BGP session causes oscillation between two valid paths, neither optimal.

Detection: RUM (Real User Monitoring) data aggregated per ASN. Mitigation: AS-path prepending or community manipulation to make the desired PoP more attractive; manual NOC outreach to the ISP. Not solvable purely at the CDN's layer because the problem is in the ISP's autonomous decision-making.

Honest acknowledgement: ~0.5-2% of users globally are on a sub-optimal PoP at any moment. You measure, alert, route-engineer; you don't promise zero.

(e) ECS (EDNS Client Subnet) and DNS-routed alternatives

For DNS-based steering (older CDN architectures, or hybrid anycast + DNS), the CDN's authoritative DNS needs to know the client's IP subnet, not the resolver's. Without ECS, a German user using 8.8.8.8 (Google DNS in the US) gets routed to a US PoP.

ECS (RFC 7871) lets the authoritative DNS see the client subnet (privacy-preserving by being a /24 or /48, not the full address). Most modern resolvers (Google DNS, Cloudflare 1.1.1.1, Quad9) support ECS. Corporate resolvers often don't, leading to mis-routing for enterprise users.

This is one reason anycast is preferred over DNS-routing at modern CDNs: anycast doesn't need ECS because the BGP path itself implies the user's location.


§13. Origin shield mechanics in detail

§11's architecture diagram mentions origin shield in passing. It is one of the most important components for protecting origin under load, and it deserves a section.

(a) Request coalescing at shield

The shield's primary job: deduplicate cache misses for the same key into a single origin fetch.

Time t=0:   1000 PoPs around the world have cache miss on /api/trending
            (TTL just expired, popular endpoint).
Time t=1ms: 1000 requests arrive at the shield (mid-tier or per-region).
Time t=2ms: Shield sees "first request for /api/trending, no cache entry,
            need to fetch from origin." Marks the key as "in-flight."
            Issues ONE origin fetch.
            All 999 subsequent requests for the same key check "is there
            an in-flight fetch?" Yes → they attach to the same promise,
            wait for the origin response.
Time t=50ms: Origin responds.
Time t=51ms: Shield writes to its cache, then fans out the response to
             all 1000 waiting PoPs.

Origin saw: 1 request. Without shield: 1000 requests.

The implementation: a hash map keyed on cache key, value is "in-flight promise + list of waiters." First request creates the promise; subsequent requests block on it. When the origin response arrives, all waiters get the same response.

Nginx implements this as proxy_cache_lock + proxy_cache_lock_timeout. Varnish has it built in as request coalescing on the same hash. Custom CDN cache engines all have an equivalent.

(b) Coalescing scope

The hierarchy matters: - Per-cache-box coalescing: requests within one box dedupe. Effective at small scale. - Per-PoP coalescing: typically handled by sharding (each URL maps to one box per PoP) plus per-box coalescing. - Cross-PoP coalescing at shield: this is the real value. The shield collapses misses from N PoPs into 1.

The shield is sometimes called a "tier 2" or "tier 3" cache; the structure depends on the CDN: - Cloudflare: regional tier (~30 regions) + optional Argo Smart Routing for path optimization. - Fastly: "shielding" — a single specific POP designated as the upstream for a given service. - CloudFront: "Origin Shield" — explicit regional shield you enable per origin. - Akamai: SureRoute + tiered distribution.

(c) Shield-to-origin connection pool

Origins typically can't handle thousands of TLS handshakes per second from a CDN. The shield maintains a persistent connection pool to each origin:

  • Long-lived HTTP/2 connections (or HTTP/3) reused across requests.
  • TLS session resumption to avoid full handshakes.
  • Configurable concurrency limits per origin.
  • Health checks to detect origin failure quickly.

Concretely: origin sees a small set of established connections (10-100) from the shield, each carrying multiplexed requests. From the origin's perspective, the entire CDN looks like a single (large) client with persistent connections.

(d) The shield warms up new PoP

Problem 3 in §15 (cold start) is the case where a new PoP comes online with empty cache. The shield helps here:

1. New PoP P_new opens; BGP routes ~5% of regional traffic to it (staged).
2. Cache misses on P_new flow to the regional shield.
3. Shield has the popular set already cached (it serves all PoPs in the region).
4. P_new gets responses quickly from shield (warm-cache hit at shield).
5. P_new fills its local cache; over hours its hit rate climbs.
6. Origin sees almost no extra load during P_new's warmup, because the
   shield absorbed the misses.

Without shield: P_new misses → straight to origin → 100x normal origin load during warmup. With shield: ~1x normal load on shield; ~0% extra on origin.

(e) Shield outage handling

Shield is a single point in the hierarchy. CDNs run shields as HA (High Availability) clusters (3-5 nodes typically) with health-checked failover. When the active shield fails:

  1. PoPs detect connection failure (health checks or TCP RST).
  2. PoPs reroute to a backup shield (different node in the cluster or different region).
  3. The backup shield's cache may be colder, but it's still warmer than origin.
  4. Restored shield rejoins; eventually traffic balances back.

The 2-5 s during failover sees elevated origin load (some PoPs may hit origin before backup shield kicks in). Within SLO for most CDNs but visible in metrics.

(f) When you do (and don't) want shield

Shield is on by default for most enterprise CDN customers. You don't want it when: - Origin is in the same datacenter as the shield (shield is just a hop with no benefit). - Origin is itself a CDN or already coalesces (e.g., S3 already deduplicates). - Workload is so low-volume that shield's bandwidth is wasted.

You always want it when: - Origin is regional (single datacenter) and traffic is global. - Workload includes popular dynamic content (where misses cluster on keys). - TTLs are short enough that re-fetches are common. - Origin is fragile under load.


§14. HTTP/3 at edge

HTTP/3 (RFC 9114) runs HTTP over QUIC (RFC 9000) over UDP. The transition is happening now: as of 2026, ~30% of web requests on major CDNs are over HTTP/3, climbing.

(a) Why QUIC, briefly

TCP + TLS has fundamental shortcomings on mobile, lossy networks:

  • 3-RTT (Round Trip Time) connection setup in worst case (TCP 1.5 RTT + TLS 1.3 1-RTT, mostly bypassable with TFO and session resumption but they're not universal).
  • Head-of-line blocking in TCP: one lost packet stalls all streams above it. HTTP/2 multiplexed streams over a single TCP connection inherit this — a single packet drop blocks all streams.
  • No mid-connection migration: TCP connection is identified by (src_ip, src_port, dst_ip, dst_port). Change any of those (WiFi to cellular) and the connection breaks.

QUIC fixes all three:

  • 0-RTT connection setup for repeat visitors (using a cached session ticket).
  • Per-stream loss recovery: a lost packet on stream 5 doesn't block streams 1-4.
  • Connection ID independent of IP/port: a connection can survive a NAT (Network Address Translation) rebinding, a WiFi-to-cellular handover, or a corporate VPN reconnect.

(b) 0-RTT for repeat visitors

A returning client with a valid session ticket sends the GET request in the very first packet to the server. The server can begin processing the request before any RTT has occurred.

First visit:
  Client SYN+ClientHello -> Server SYN-ACK+ServerHello+EncryptedExtensions -> Client GET
  (~1 RTT before request)

Repeat visit with cached ticket:
  Client SYN+ClientHello+GET (encrypted with cached key) -> Server processes immediately
  (~0 RTT before request)

For sites visited frequently — banking apps, social feeds, anything mobile and personal — 0-RTT shaves ~100 ms off the first-byte time on every revisit. Cumulative across a session, multi-hundred ms saved on user-perceived latency.

Caveat: 0-RTT data can be replayed by an attacker. Servers must accept only idempotent requests as 0-RTT data. GET is fine; POST is not. CDNs typically allow 0-RTT for GETs only.

(c) Connection migration

A user opens a page on WiFi at home; halfway through reading, they leave home and switch to cellular. Under TCP, every connection breaks; the browser re-establishes them. Under QUIC, the connection ID identifies the connection independent of IP/port. The client sends a packet from its new IP; the server sees a new IP with a known connection ID; it continues the session.

For mobile users in transit (subway, walking), connection migration eliminates a class of stalls. CDN-level metric: mobile p99 time-to-interactive drops by ~50-150 ms after enabling QUIC.

The "we shaved 100 ms off mobile p99" result is the typical real-world report from a major CDN customer post-QUIC. Stripe, Reddit, Cloudflare itself have all reported this magnitude of improvement.

(d) CDN adoption status

  • Cloudflare: HTTP/3 GA since 2019; enabled for all customers by default; >40% of Cloudflare traffic over HTTP/3 as of 2025.
  • Fastly: HTTP/3 GA; enabled via feature flag for most customers.
  • CloudFront: HTTP/3 GA since 2022; opt-in per distribution.
  • Akamai: HTTP/3 GA; varies by product.
  • Google Cloud CDN: HTTP/3 GA.

Some networks (mostly enterprise / school WiFi) block UDP egress to port 443, preventing HTTP/3. Clients fall back to HTTP/2 over TCP automatically. The CDN serves both; the client picks.

(e) Operational implications

  • UDP path differs from TCP path. Edge load balancers (XDP/eBPF) need to handle UDP+QUIC; new code paths.
  • Stateful UDP: connection IDs are server-assigned; the load balancer needs to track connection-ID-to-server mapping (otherwise packets from a migrated client land on the wrong server). Solved by encoding the server identity in the connection ID itself.
  • DDoS surface is different. UDP amplification attacks are a real concern; QUIC has specific anti-amplification guards (the server can only send 3x the bytes it has received until it has validated the client). Edge filters tuned for QUIC traffic patterns.
  • Observability changes. Tools like tcpdump work on UDP packets but you need QUIC-aware decoders to inspect protocol details. Wireshark has good support.

§15. Hard problems inherent to the technology

Problem 1: Purge propagation at scale

Naive: when origin updates, call a purge API broadcasting "drop key X" to all PoPs.

How it breaks: 300 PoPs × 100 boxes = 30,000 nodes per purge. A surge of 1000 purges/sec → 30M RPCs (Remote Procedure Calls)/sec on the control plane. Partition isolates a PoP — stale forever. Purge-by-tag ("all 50,000 product pages for category X") requires inverted indices per PoP. Failures rhyme across content types: e-commerce static assets missing a sale-start purge show wrong prices for hours; API response cache missing a feature-flag flip serves old response shapes and clients crash on parsing; video segment cache serves a takedown-ordered clip — legal exposure.

Real fix:

  • Hierarchical fanout via pub-sub. Origin → control plane → region → PoP → box. Cloudflare's "Quicksilver" KV distributes config + purge metadata globally within ~150 ms p99 by gossip/fanout.
  • Surrogate keys / tags. "Cache-Tag" (Cloudflare) or "Surrogate-Key" (Fastly): origin emits Cache-Tag: post-123 category-electronics; CDN indexes tag → key set; purge-by-tag iterates the index.
  • Soft purge as default. Mark stale; serve stale-while-revalidate; converge lazily. Hard purge only for legal takedowns and breaches.
  • Versioned URLs as the actual fix. app.abc123.js. New version = new URL = no purge. The right answer for >90% of invalidation problems.

Problem 2: Cache stampede / thundering herd

Naive: TTL expires; next request misses; PoP fetches origin.

How it breaks: /api/trending cached at all PoPs with TTL=60s. At second 60 it expires everywhere simultaneously. 300 PoPs × 100 boxes = 30,000 simultaneous origin fetches in the same millisecond — origin falls over. Hits viral image, popular API endpoint, hot video segment, frequently-pulled ML model equally.

Real fix:

  • Request coalescing at every tier. If a key is "being fetched," incoming requests attach to the in-flight promise. Nginx has proxy_cache_lock; Varnish has it built in; Fastly's VCL does it natively.
  • Origin shield as coalescing point. Each PoP collapses its requests to one; shield collapses N_pops requests to one. Origin sees a single fetch per region.
  • Stale-while-revalidate. Cache-Control: max-age=60, stale-while-revalidate=3600. First miss triggers a single async revalidation.
  • Jittered TTLs. Random ±5% so expirations don't align.

Effect: even at 10M rps, origin sees single-digit RPS for hot keys because everything collapses through the hierarchy.

Problem 3: Cold start at a new PoP

Naive: bring up the new PoP; let it warm naturally.

How it breaks: New São Paulo PoP opens; BGP routes Brazilian traffic to it; for the first hour hit rate is ~0%. Local origin shield doesn't exist yet, so most misses go to origin — 100x traffic spike during warm-up. Same shape whether the workload is images, video, API, or downloads.

Real fix:

  • Staged BGP advertisement — longer AS path initially so the PoP gets ~5% of traffic; ramp up.
  • Cache pre-warming — replay top-1000 requests for the region from a production log tap before opening; brings hit rate to ~50% at launch.
  • Tiered cache makes cold PoPs cheap — cold L1 hitting a warm L2 mid-tier is fine.
  • Steal-from-neighbors — some CDNs replicate keys from neighbor PoPs via gossip.

Problem 4: Geo correctness (anycast pathology)

Naive: anycast routes to the nearest PoP. Trust BGP.

How it breaks: "Nearest" is AS-path nearest, not physically nearest. Anycast islands: a misconfigured router in some ISP sends all anycast traffic to the wrong PoP for hours because BGP doesn't notice. ECS fails when the resolver doesn't pass it (older corporate resolvers, some VPNs). Indian ISP traffic routed to Singapore for a morning because their transit has a BGP issue — 250 ms RTT instead of 15 ms; conversion drops.

Real fix:

  • RUM data feeds routing decisions. Per-ASN latency aggregated; CDN adjusts BGP advertisements (AS-path prepending) or DNS responses to push traffic right.
  • Multi-PoP healthchecks + BGP route-reflector tuning to prefer specific peers per ASN.
  • GeoDNS as a fallback steering layer — different anycast IPs for different regions.
  • Honest acknowledgement: at scale you live with ~0.5-2% of users on a sub-optimal PoP. Measure, alert, don't promise zero.

Problem 5: DDoS absorption

Naive: detect attack, rate-limit at firewall, block source IPs.

How it breaks: 5 Tbps L3 SYN flood saturates any single origin wire in seconds — you can't block what you can't receive. L7 (Layer 7) floods look like real traffic; per-IP rate limits get evaded by million-IP botnets.

Real fix:

  • Anycast itself is the first absorption layer. 5 Tbps from a worldwide botnet distributes across ~300 PoPs → ~17 Gbps per PoP — within capacity. Passively diluted.
  • L3/L4 scrubbing at edge. XDP/eBPF programs drop garbage (malformed SYNs, ICMP floods, unrelated UDP) at line rate before any TCP socket exists.
  • Challenge-response for L7. JS challenge ("I'm Under Attack Mode"), TLS fingerprinting, behavioral analysis.
  • WAF rules + ML-based anomaly detection (Cloudflare Bot Management, Akamai Bot Manager Premier).
  • Origin IP secrecy. Real origin IP never published; all traffic ingresses through the CDN.

DDoS economics: attacker must outbid defender's wire capacity. 280 Tbps forces a 280 Tbps botnet — possible in theory, never seen in practice.

Problem 6: Cache-key sensitivity and cache poisoning

Naive: include all request headers in the cache key to avoid Vary mistakes.

How it breaks: Attacker sends Host: evil.com with URL /app.js. CDN caches app.js under attacker's host. Next legitimate request: served attacker's variant if hashing is wrong. Web cache poisoning — PortSwigger has a whole library. Including arbitrary headers → cache fragmentation → hit-rate collapse. Independent of content type — any CDN-fronted endpoint is vulnerable.

Real fix:

  • Normalize the cache key before hashing: strip headers not in an allowlist, lowercase host, canonicalize query params.
  • Vary header allowlist — only Vary: Accept-Encoding and known-safe variants. Vary: User-Agent normalized to UA classes.
  • Web cache poisoning audits (PortSwigger Cache Deception scanner).
  • Never trust user-controlled headers in cache key unless explicitly whitelisted and content-bearing.

§16. Web cache poisoning attacks in depth

Problem 6 in §15 mentioned web cache poisoning at a high level. The attack class is interesting enough — and dangerous enough — to deserve a deeper treatment. PortSwigger researcher James Kettle popularized the class with a series of papers starting 2018; multiple major CVEs (Common Vulnerabilities and Exposures) have followed.

(a) The general pattern

The attacker crafts a request whose response can be controlled — typically by exploiting a header that the origin reflects into the response but the CDN doesn't include in the cache key. The poisoned response is cached and served to subsequent legitimate users.

Concretely:

1. Attacker observes that `/welcome` reflects an X-Forwarded-Host header
   into the rendered HTML (e.g., the origin generates absolute URLs
   like https://X-Forwarded-Host-value/...).
2. Attacker sends:
     GET /welcome HTTP/1.1
     Host: example.com
     X-Forwarded-Host: evil.com
3. Origin renders the page with evil.com baked into all link URLs and
   asset paths.
4. CDN caches the response under the URL key. X-Forwarded-Host is NOT
   in the cache key (the CDN is unaware it's content-bearing).
5. Next legitimate user requests /welcome.
6. CDN serves the poisoned cached response.
7. User's browser loads scripts from evil.com.
8. Attacker has script execution in legitimate user's context.

The general formula: unkeyed input → keyed output. Any input that affects the response but isn't in the cache key is a poisoning vector.

(b) Common variants

  • X-Forwarded-Host reflected in absolute URLs. Origin uses X-Forwarded-Host to construct canonical URLs in HTML / redirects. Attacker sets it to a hostile domain.
  • X-Forwarded-Scheme reflected in absolute URLs. Switch HTTPS to HTTP, downgrade-attack.
  • X-Rewrite-URL or X-Original-URL. Some frameworks parse these to determine the "real" URL. Attacker rewrites to a different path; response cached under the original URL but reflecting the rewritten path's content.
  • Referer header reflected in CSRF tokens or canonical URLs. Long shot, but seen.
  • Cookie reflected without being in cache key. If the cookie is included in the body (e.g., to display the username) but not in the cache key, the body of one user's response gets cached for everyone.
  • Cache-key normalization mismatch between layers. If the CDN strips a query param but the origin uses it, the origin response (varying by param) gets cached under a single key (without the param). Different param values get different responses, only one of which is cached for everyone.

(c) The "X-Forwarded-Host in cache key" CVE pattern

A standard CDN config has Host in the cache key but not X-Forwarded-Host. When an origin uses X-Forwarded-Host for canonical URL construction, the attacker exploits the mismatch.

The fix: strip headers not in an allowlist before forwarding to origin. If origin shouldn't be making decisions based on X-Forwarded-Host (and shouldn't trust it from the client), don't pass it at all.

This CVE pattern has hit: - Multiple production sites' search-result and listing pages. - Single-page-app routes that compute API base URLs from request headers. - Documentation sites that render absolute links based on the host.

(d) Defenses

  • Header allowlist at CDN. Only pass headers the origin needs; strip everything else. Cache key includes only declared content-bearing headers.
  • Normalize the cache key after stripping. Cache key derives from URL + allowlisted headers, normalized canonically.
  • Origin doesn't trust headers. Origin should never use X-Forwarded-Host or similar attacker-controllable inputs for canonical URL construction. If origin needs to know its public hostname, the CDN should inject it as a server-managed header (set by CDN config, not passed through).
  • Vary correctly when origin response varies. If origin's response really does depend on a header, set Vary: Header-Name and include it in the cache key explicitly.
  • WAF rules for known poisoning patterns. Block X-Forwarded-Host with hostile domains, headers that don't match expected format, etc.
  • Cache poisoning DoS variant. Attacker sends weird User-Agent values to force cache fragmentation and exhaust the cache. Defense: bucket UA before keying; cap distinct variants per URL.
  • Cache deception (the inverse). Attacker tricks CDN into caching a private response (e.g., user's profile page) by requesting /profile/.css — origin returns user's profile HTML but CDN sees .css and caches it as public. Future requests for /profile/.css get the cached private data. Defense: strict URL path validation; never cache a response whose content type doesn't match its URL extension.

(e) Detection

Synthetic monitoring that checks for poisoning: - Periodically issue normal request; capture response. - Issue request with hostile header (X-Forwarded-Host: synthetic-test-only.com); capture response. - Issue normal request again; compare to first. - If hostile-influenced response is served on the second normal request, poisoning vector exists. Alert.

Most CDN-aware security tools (PortSwigger's Param Miner, Burp's cache deception scanner, OWASP testing guides) automate this.


§17. Security at edge

Beyond cache poisoning, the edge is the place where most of an internet-facing security stack runs. CDNs have evolved to incorporate DDoS mitigation, WAF, bot management, and TLS / certificate operations as first-class features.

(a) DDoS L3/L4 — volumetric absorption via anycast

§2 and §15 mentioned this: anycast is the first absorption layer. 5 Tbps from a worldwide botnet distributes across 300 PoPs, becoming 17 Gbps per PoP — within any modern PoP's wire capacity. The CDN doesn't have to filter most of the attack; it just has more wire than the attacker.

Beyond passive dilution: - XDP/eBPF programs at line rate at each PoP drop garbage packets — malformed SYNs, ICMP floods, unrelated UDP, SYN floods. eBPF runs in the kernel before sockets are created; ~100 Mpps (million packets per second) drop rate per server. - SYN cookies to handle SYN floods without allocating state per half-open connection. - Per-source-IP rate limits, optionally per-ASN, applied at L4 in kernel. - Geographic / ASN-based filtering: if an attack is sourced almost entirely from one ASN, drop traffic from that ASN at edge.

Cloudflare absorbed a 71M rps DDoS in 2023 (publicly disclosed); the attack was almost entirely absorbed at L4 without disrupting customer traffic.

(b) DDoS L7 — application-layer floods

Layer-7 attacks look like real HTTP requests. Per-IP rate limits get evaded by botnets with millions of IPs. The defense stack:

  • WAF rules for known attack signatures (OWASP top 10 — SQL injection, XSS, etc.).
  • Rate limiting by path, by source, by fingerprint, by combinations.
  • Challenge pages: JS challenge ("solve this challenge to prove you're a browser"), CAPTCHA, Cloudflare Turnstile, hCaptcha. Reduce bot traffic by ~99% with minimal user friction.
  • Behavior analysis: track request patterns per source — sites visited, time-of-day, header distribution. Anomalies (one IP requesting only /login 1000 times) trigger blocks.
  • ML-based detection: Cloudflare Bot Management, Akamai Bot Manager Premier, Fastly Next-Gen WAF. Trained on the CDN's global traffic; identifies emerging bot patterns.
  • "I'm Under Attack" mode: aggressive challenge for everyone. Sacrifices some user experience for total bot rejection during active attacks.

(c) Bot management

Bot management has become a substantial category. Major techniques:

  • TLS fingerprinting (JA3 / JA4 / JA4+): hash of the TLS ClientHello (cipher suites, extensions, ALPN values). Different clients (Chrome, Firefox, curl, Python requests, Go net/http) produce different fingerprints. JA3 was the original; JA4 (Foxio's update from 2023) is more robust against minor library changes. Used to identify automation tools posing as browsers.
  • HTTP/2 fingerprinting: similar to TLS but on the HTTP/2 SETTINGS frame and pseudo-header order. Reveals impersonation.
  • Behavior analysis: mouse movement, scroll patterns, click timing. Bots scroll mechanically; humans don't. Cloudflare's Bot Management uses on-page beacons.
  • Reputation databases: known-bot IP lists (datacenter ranges, Tor exit nodes, residential proxy services).
  • Challenge cascades: simple challenges (JS computation) for likely-legit clients; harder challenges (CAPTCHA) for suspicious.

The arms race: bot writers use real browsers (via Playwright, Puppeteer), residential proxies (rotating real consumer IPs), and human solvers for CAPTCHAs. Detection moves up the behavior stack.

(d) WAF (Web Application Firewall) rules

A WAF inspects HTTP requests and blocks ones that match attack signatures. OWASP top 10 mitigations:

  • A01: Broken Access Control — WAF can enforce URL access rules.
  • A02: Cryptographic Failures — TLS enforcement at edge.
  • A03: Injection — SQL injection patterns (UNION SELECT, ' OR '1'='1), command injection, OS injection. WAF rules match patterns in path, query, body, headers.
  • A04: Insecure Design — WAF can enforce specific schemas (e.g., parameter must match expected regex).
  • A05: Security Misconfiguration — out of WAF scope but related (e.g., strip Server header).
  • A06: Vulnerable Components — virtual patching: WAF rule blocks specific exploit signatures for known CVEs before origin is patched.
  • A07: Auth Failures — rate limit on login endpoints.
  • A08: Software and Data Integrity — content-type and signature checks.
  • A09: Logging and Monitoring — WAF logs feed SIEM (Security Information and Event Management).
  • A10: SSRF (Server-Side Request Forgery) — block requests that contain internal IP ranges in path/body.

Cloudflare's "Managed Rules" (formerly OWASP Core Rule Set integration) and Fastly Next-Gen WAF (formerly Signal Sciences) are the dominant managed WAF offerings. Each ships rule packs updated continuously.

WAF rules have false positives. Tuning involves: paranoia levels (low / medium / high false-positive rate), per-route exemptions, learning mode (log but don't block initially), gradual enforcement rollout.

(e) TLS termination and certificate management

The CDN terminates TLS at the edge for almost all customer hostnames. This means:

  • The CDN holds the private key for the customer's domain. Trust matters; CDNs have aggressive operational security around key storage (HSM — Hardware Security Module, key sealing, audit logs).
  • Certificate provisioning is automated. Customer points the domain at CDN nameservers; CDN provisions a Let's Encrypt or DigiCert certificate; ACME (Automated Certificate Management Environment, RFC 8555) handles validation and rotation.
  • SNI (Server Name Indication) lets the CDN host thousands of customer hostnames on a single anycast IP. The TLS handshake's SNI extension tells the CDN which cert to present.
  • TLS 1.3 is now the default; TLS 1.2 is supported for compatibility; TLS 1.0/1.1 are deprecated.
  • OCSP stapling (Online Certificate Status Protocol) is on by default — CDN fetches OCSP responses from the CA and serves them stapled to the TLS handshake. Saves the client an OCSP round trip.
  • Certificate Transparency logging is mandatory; all issued certs are public.

For customers who don't want to give the CDN their private key, options exist: - Cloudflare Keyless SSL: CDN does the bulk of the TLS handshake but RSA private-key operations happen at the customer's own key server. Adds ~30-50 ms to the handshake but the CDN never sees the key. - Encrypted Client Hello (ECH): emerging standard to encrypt the SNI itself, hiding which hostname the client is requesting.

Cert rotation: typically every 90 days for Let's Encrypt, automated. CDN handles renewal transparently.


§18. Failure mode walkthrough

Single cache box crashes mid-fetch. Box B17 kernel-panics mid-fetch. Other boxes unaffected; consistent hashing remaps B17's vnodes; each neighbor sees ~1/(N-1) extra load. Health checks drop B17 from the L4 LB ring within sub-seconds. Inflight origin request is orphaned; client times out and retries; retry hits a different box → cache MISS → re-fetches. B17 reboots, rebuilds its index from on-disk segments (~1 s per TB), rejoins. Durability point: on-disk segments survive the crash; the in-RAM index is reconstructible from segment metadata.

Shield crashes during a popular asset's cold fetch. Shield runs as a small HA (High Availability) cluster (3-5 nodes per region). L2 detects connection failure, retries to a sibling shield node, which re-fetches from origin. Another request-collapse cycle dedupes within the new node. Gap ~2 s; some L2 requests see elevated latency or 503 if retries fail. Durability point: origin is the durability layer; shield is a cache.

Control-plane death (purge propagation halts). Quicksilver-equivalent loses leadership during a major purge campaign. Purges queue locally at the previous leader; Raft elects a new leader; outstanding purges replay from the durable log. Gap ~5-30 s; some PoPs serve stale longer than the SLO. The data plane is unaware of control plane outage — request-path serving SLO unaffected. This separation is load-bearing. Durability point: Raft-replicated purge log.

Network partition splits a region. Submarine cable cut isolates a continent from the control plane and origin. PoPs continue serving cached content. Misses to origin fail open (stale-if-error) or fail closed (503). Purges queue locally and deliver when connectivity restores. Bounded staleness SLO violated during partition. Durability point: TTLs — content eventually expires and re-fetches once partition heals.

Permanent PoP loss. Earthquake destroys a PoP. BGP withdraws routes; traffic flows to the next-best PoP; affected users see ~20-50 ms more latency but service continues. New PoP provisioned over months, brought online with the staged rollout from Problem 3. Durability point: PoPs are stateless w.r.t. origin truth; cache state is reconstructable.

BGP misroute (the 2008 YouTube-Pakistan incident type). A misconfigured ISP announces a BGP route claiming the CDN's anycast IPs; bad route propagates; traffic blackholes or routes to misadvertiser. RPKI (Resource Public Key Infrastructure) rejects unauthorized announcements at well-configured peers (increasingly deployed). NOC (Network Operations Center) calls the misadvertiser's upstream and demands withdrawal — painful, manual, hours. Durability point: the cache architecture provides no help here; mitigation is at the routing layer.

Partial cache corruption. Bad SSD firmware causes silent corruption. Checksums on every cache object: verify before returning; on mismatch treat as miss, re-fetch, evict the offending segment. Per-box monitoring of mismatch rate triggers depooling. Cache is self-healing. Durability point: origin is the source of truth.


§19. Failure modes not covered by the baseline

The walkthrough in §18 covers the canonical failures. Some that don't fit there but are common enough to know:

(a) Origin DDoS via cache stampede on a deploy

Pattern: site deploys a new version. The deploy purges all cached pages. Within seconds, the entire global PoP fleet has no cache for the homepage and other popular pages. Traffic continues at normal levels — but all of it is misses. Origin sees ~10,000x normal request rate.

Real example: a popular news site deploys at 9 AM EST during morning traffic. They purged-by-tag the homepage and top-50 article pages. Within 5 seconds, 1M requests hit origin (where steady-state was 100). Origin database fell over; site was down 20 minutes.

Mitigations: - Stagger purges. Don't purge everything at once. Group into batches purging over 60+ seconds; the cache fills happen gradually. - Use versioned URLs, not purges. New deploys produce new asset URLs; old cached responses are still served until the HTML referencing them updates. - Cache pre-warming. Before purging, fetch the new responses to populate cache; then purge old (or simply let old TTL out). Origin sees the warming traffic from one source, not a global stampede. - stale-while-revalidate. After purge, serve stale; refresh in background. Origin sees coalesced revalidations, not raw misses.

(b) CDN provider outage

CDN providers are themselves single points of failure. Notable incidents: - Cloudflare June 21, 2022: bad BGP configuration change took down 19 PoPs for ~30 minutes. Discord, Shopify, many others affected. - Cloudflare July 2, 2019: bad WAF rule (regex going exponential) consumed 100% CPU on every Cloudflare server worldwide for ~30 minutes. Half the internet's accessible-via-Cloudflare went down. - Fastly June 8, 2021: bad customer VCL push triggered a latent service bug; downed the data plane globally for ~49 minutes. BBC, Reddit, NYT, Amazon, gov.uk all dark. - AWS CloudFront occasional regional outages: less impactful per incident; more frequent than Cloudflare/Fastly's global ones. - Akamai July 22, 2021: DNS service outage took down many enterprise customers for ~1 hour.

Mitigation: multi-CDN (covered in §20). Cost: complexity and 2x egress contracts. Benefit: a 49-minute outage doesn't take you down.

(c) DNS resolver issues

CDNs depend heavily on DNS: - Authoritative DNS at the CDN: when Cloudflare's authoritative DNS fails, every Cloudflare-fronted site's DNS resolution fails. Cloudflare had this in October 2019 (Quad9 cascading issue). - Resolver cache poisoning at ISPs: a misbehaving ISP resolver caches a wrong answer; users get sent to wrong PoP or non-PoP for hours until the cache TTL expires. - DNS over HTTPS / DNS over TLS (DoH / DoT) changing visibility: when clients use DoH, the CDN's ECS visibility drops; routing becomes less accurate. - Long DNS TTLs become a liability during outages: a 24-hour TTL on a CNAME means failover takes 24 hours to propagate. CDN customer DNS records typically have very short TTLs (~60 s).

Mitigation: short TTLs on customer DNS records; monitor authoritative DNS uptime as a separate SLO from CDN data plane; have a DNS failover plan (Route 53 health checks, NS1 traffic management).

(d) IPv6 edge cases

IPv6 deployment is complete enough to matter (~40% of US traffic, ~50% of Indian traffic, varies by region) but not uniform: - Some PoPs have IPv6 transit issues unrelated to IPv4. - Customer origins on IPv6-only sometimes get unreachable from older IPv4-only PoPs. - ECS support over IPv6 is sometimes worse than over IPv4 (resolvers strip ECS on AAAA queries). - Anycast pathology can differ between v4 and v6 — same client may take a different path on v6 vs v4.

Mitigations: dual-stack everywhere; per-AFI (Address Family Identifier) monitoring; some CDN customers explicitly disable IPv6 if reliability matters more than the modest latency gain.

(e) Long-tail content overwhelming a small PoP

A new PoP comes online; immediately a content scraper / archive bot decides to walk the entire site, hitting cold long-tail URLs. The PoP's small cache fills with one-hit-wonder URLs, evicting the popular set, hit rate plummets to 30%, origin load spikes.

Mitigations: - W-TinyLFU admission filter (§4(c)) — exactly the pattern it defends against. - Rate limit per source IP / ASN at edge before cache lookup. - Identify and bypass known crawlers if they don't represent normal user traffic.

(f) Cache hierarchy inversion

Sometimes L2 / shield becomes the bottleneck because all L1 misses converge there. If L1 hit rate drops (cache key fragmentation, deploy purge), L2 sees a sudden burst it can't handle. L2 starts dropping requests; L1 starts proxying directly to shield; shield falls over; origin starts seeing all the load.

Mitigations: per-tier capacity provisioning; circuit breakers between tiers; observability on per-tier hit rates so degradation is visible early.

(g) Slow-loris and slow-read attacks

Attacker opens many connections, sends bytes very slowly to keep them alive, exhausting CDN connection pool. Modern CDN edge defends with: connection rate limits per source, idle-connection timeouts, HTTP/2 / HTTP/3 (multiplexed streams are less vulnerable than dedicated TCP connections).


§20. Multi-CDN strategies

Single-CDN outages happen (§19). For a high-availability site, the answer is multi-CDN: contracts with two or more CDN providers, traffic shifted between them via DNS or some routing layer.

(a) When multi-CDN is worth the complexity

Cost: double egress contracts (typically less than 2x because volume tiers — but more than 1x), engineering complexity, surrogate-key purging on multiple providers, configuration drift risk.

Benefit: provider-outage resilience, ability to negotiate (your CDN knows you can shift to competitor), regional performance optimization (some CDNs are better in some regions), regulatory (China requires a domestic CDN; sometimes Russia, Iran, etc.).

Worth it when: - Site availability SLO is high (4 nines or better) and you've measured that CDN outages can violate it. - Egress is large enough that the multi-vendor price drives down per-GB rates more than complexity costs. - You have a global footprint with regional performance variance. - You need vendor independence (regulatory, business risk).

Not worth it when: - Site uptime requirements are relaxed (small SaaS, internal apps). - Traffic is small (single-vendor minimums make multi-vendor uneconomic). - You don't have the engineering capacity to maintain two configurations.

(b) DNS-based steering — NS1, Cedexis, AWS Route 53

The classic multi-CDN front-end: a "traffic management" DNS service decides per query which CDN to route the user to. The user resolves cdn.example.com; the traffic manager returns the CDN-A or CDN-B CNAME based on:

  • Per-CDN health (active healthchecks; route away from unhealthy).
  • Per-region latency (RUM data; route to the CDN with better p50 latency in this region).
  • Cost weighting (route 80% to cheaper CDN; 20% to expensive for warm-keeping).
  • Per-customer-tier (paying customers get the better CDN).

Vendors: - NS1 (now part of IBM): rich traffic management; per-query decisions. - Cedexis (Citrix Intelligent Traffic Management): RUM-driven; large dataset. - AWS Route 53: simple weighted / latency-based / healthcheck routing. - Cloudflare Load Balancing: their own multi-origin LB; can balance across non-Cloudflare backends.

DNS-based steering has limits: DNS TTL is the bound on failover speed. A 60 s TTL means up to 60 s of users sent to a dead CDN after the manager detects failure. Some resolvers cache longer than TTL (especially older corporate resolvers).

(c) HTTP-level steering — origin-side fallback

A second pattern: the user always hits CDN-A first (DNS points there). If CDN-A is down, the client / browser falls back to CDN-B via some signaling. Implementations:

  • <picture> / <img srcset> fallback at HTML level: include multiple URLs; browser tries in order. Works for images but not for the HTML page itself.
  • Service worker intercepts fetch failures and retries on alternate domain. Works for everything but requires the service worker to be already installed (cold visits don't have it).
  • Client-side smart routing libraries that probe and choose. Used by some video players.

These work for parts of a site but not the whole thing (the initial HTML must come from somewhere).

(d) "If CDN A fails, route to CDN B" auto-failover

The most common production pattern combines DNS + healthchecks:

1. Traffic manager (NS1, Cedexis) healthchecks CDN-A and CDN-B
   from multiple vantage points worldwide.
2. Normal: 80% to CDN-A, 20% to CDN-B (or 100% to CDN-A, 0% to B
   with B kept warm by synthetic load).
3. CDN-A fails healthcheck globally.
4. Traffic manager flips: 100% to CDN-B.
5. DNS TTL controls how fast clients see the change (60 s typical).
6. CDN-B serves; possibly degraded (its cache wasn't warm for all
   of CDN-A's URLs, so origin sees increased load).
7. CDN-A recovers; traffic manager flips back gradually.

(e) Operating concerns

  • Configuration drift. Two CDNs need the same WAF rules, cache settings, surrogate keys, redirects. Use Infrastructure-as-Code; deploy to both from a single source.
  • Purge fanout. Surrogate-key purges must hit both CDNs. Custom tooling typically.
  • Observability. Need a unified view of traffic across both CDNs; per-CDN dashboards alone leave gaps.
  • Cost reporting. Two contracts; reconcile usage across both monthly.

(f) Vendor lock-in mitigation

Even without active multi-CDN, having a tested fallback CDN gives: - Leverage in contract negotiation. - Validated alternative if primary CDN raises prices unsustainably. - Continuity plan if primary CDN goes out of business (rare but real — smaller CDNs have).

Many enterprises run "warm standby": all configuration on CDN-B; ~1% of traffic going there to keep it warm; ready to flip to 100% in minutes.


§21. Why not just nginx in front of origin

The most common pushback. The answer is layered.

Step 1: "Just put nginx in front of origin, cache there."

Why it doesn't work:

  1. No anycast, no global presence. A single nginx in us-east-1 is still 150 ms from Tokyo. Caching without proximity solves only half the problem.
  2. Scaling by adding regional nginx: how does a user reach the right one? DNS-based steering requires GeoIP infrastructure (you've started building a CDN); anycast IPs require BGP advertisement (definitely building a CDN).
  3. DDoS absorption impossible. A single nginx absorbs maybe 50-100 Gbps; modern attacks exceed 1 Tbps.
  4. Per-region cache state independent. Five nginx instances warm independently; origin sees 5x cold-cache fetches; without a mid-tier shield every miss hits origin — thundering herd unbounded.
  5. No edge TLS termination optimization. Sydney users eat ~300 ms just for TLS setup.
  6. No global traffic engineering. Can't shift load between regions; CDN providers have global SDN (Software-Defined Networking) layers for that.
  7. Operational primitives don't exist. Centralized purge, surrogate-key tagging, real-time analytics across PoPs, RUM-driven routing — you'd have to build them all.

Step 2 — concrete failure: 100M-user site, multi-region nginx, celebrity tweets a link at 8 PM UTC. 5M concurrent users land via DNS-based steering across 5 nginx instances. Homepage is cached (100% hit). But the link has unique tracking params → 5M cache misses → 5M origin fetches. Origin is 100 servers, 50,000 rps per server — dead in 5 seconds. A CDN-bought competitor has 300 PoPs absorbing the same traffic: 17k rps per PoP, 340 rps per box, origin sees ~10 rps after shield coalescing. Their site stays up.

You haven't built a CDN. You've built a fragile single-region cache that breaks the moment something goes viral. Nginx is a component in a CDN, not a substitute — the architecture (PoPs + anycast + tiered cache + shield + control plane) is what nginx fronts at each layer.


§22. Scaling axes

Type 1 — uniform expansion (more users, same per-user behavior)

1x   ->  10 PoPs, 1 region, 1M rps
2x   ->  20 PoPs, 2 regions, 2M rps          ← add capacity uniformly
10x  ->  50 PoPs, 5 regions, 10M rps         ← add origin shield per region
100x ->  300 PoPs, all continents, 100M rps  ← RUM-driven routing, gossip-based
                                              purge, multi-tier cache hierarchy

Inflection points:

  • ~5M rps: per-region origin starts cracking. Add origin shield.
  • ~20M rps: per-box cache starts churning. Add mid-tier cache between PoP and shield.
  • ~50M rps: BGP convergence and anycast pathology dominate tail latency. Move from pure anycast to anycast + RUM-driven DNS steering hybrid.
  • ~100M rps: control plane purge fanout is the bottleneck. Hierarchical pub-sub, gossip-based propagation, surrogate-key tagging.

Type 2 — hotspot intensification (one URL going viral)

Baseline:   homepage.html at 100 rps globally
Viral:      same URL at 10M rps globally for 1 hour, then back to normal

Fundamentally different problem. More PoPs don't help if everyone is hitting one URL.

Per-PoP rps for the viral URL:
  10M rps / 300 PoPs = ~33k rps per PoP

Within a PoP:
  hash(URL) -> one box. That ONE BOX takes 33k rps.
  Normal box ceiling: ~50-100k rps. → fine, barely.

But: if cache misses (TTL expires, cold), every PoP's hot box
     simultaneously asks the shield. Shield sees 300 simultaneous
     requests, collapses to a single origin fetch.

Inflection points for Type 2:

  • Sharding the hot URL across multiple boxes in a PoP. Replicated keying: hash(URL || random_in_K) → K boxes. Each request gets a random K, distributing the hot URL across K boxes. K=10 turns 33k rps/box into 3.3k rps/box.
  • Mid-tier coalescing for very hot keys. Push coalescing all the way down to per-box above a threshold.
  • CPU saturation, not network, at extreme rps on tiny objects. Fix: TLS offload to hardware NICs (Mellanox ConnectX or equivalent).

Type 1 and Type 2 have completely different fixes. Type 1 scales topologically (more boxes). Type 2 scales by replicating the hot key locally. Saying "add more PoPs" to Type 2 is wrong — it increases the hot URL rate everywhere uniformly, exactly the wrong direction.


§23. Decision matrix — CDN vs adjacent technologies

The CDN isn't always the right answer. Depends on workload shape.

Dimension CDN / Edge Cache Object storage public read In-app cache (Redis) API gateway (Kong)
Geographic distribution Global (300+ PoPs) Per-region buckets Per-region cluster Per-region
Cache invalidation <150 ms global None — direct read Application-controlled Application-controlled
Hit rate optimization Tuned engine + admission N/A Application-tuned Limited (per-route TTL)
DDoS absorption Built-in, planetary Limited None Per-instance, limited
Programmability at edge Workers / Functions None None Plugin model
Read latency 5-30 ms anywhere 50-100 ms region 0.1-1 ms in-region 5-20 ms regional
Origin offload ratio 95-99.9% 0% (direct read) 90%+ 50-90%
Cost model $ per GB or contract $ per GB stored + egress Fixed infra $ per request

Decision thresholds:

  • Use a CDN when: public-internet traffic with global users; cacheable content (static, GET APIs, video, downloads); need DDoS protection without engineering it; origin egress > a few TB/month; sustained edge RPS exceeds what one region's origin can serve.
  • Use object storage with public read when: single-region users (B2B SaaS in one country); <100 GB/month total egress; no DDoS profile; direct S3/GCS public buckets are simpler and cheaper.
  • Use in-app cache when: private per-user session data; sub-ms latency; cache key is internal (not URL-derived); data isn't HTTP-shaped.
  • Use an API gateway when: routing, auth, rate-limiting are primary; caching is a side benefit; per-route TTL granularity matters more than global distribution.

A mature stack has all four: CDN for public traffic + object storage as canonical store + in-app cache for hot internal state + API gateway for auth/routing. They compose; they don't substitute.


Static asset hosting — the classic CDN workload. Images, JS, CSS, fonts, HTML. Long TTL (days to forever, with content-hash URLs). Origin typically S3. Hit rate ~99.9% in steady state. Every major site does this — LinkedIn, Pinterest, Etsy, Shopify storefronts. Cloudflare for indie web, CloudFront for AWS-native, Akamai for enterprise tail.

API response caching — Stripe and GitHub canonical. Public-ish data (rate cards, GitHub Pages, package metadata) at TTLs of seconds-to-minutes. Cache keys include API version + path + query params; auth headers stripped. Stripe's docs are entirely CDN-served; GitHub Pages is fronted by Fastly; package registries (npm via Cloudflare, PyPI via Fastly) spike to millions of requests per minute on a popular release.

Video streaming — Netflix Open Connect at the extreme. Video files split into ~10 s segments at multiple bitrates; segment files are the cacheable units; adaptive bitrate clients pull variable quality. YouTube uses Google's Global Cache + in-ISP. Twitch (live) uses a hybrid of multiple commercial CDNs. A few thousand titles account for 90% of viewing at any moment.

Gaming patch distribution — Steam ships 100 GB+ patches via own infrastructure + CDN. Epic delivers Fortnite updates to >100M users via Cloudflare + their own peer-assist layer. Pure Type-2 scaling from §22: split the patch into many CDN-cacheable chunks, parallel download, optional P2P (peer-to-peer) assist between clients on the same network.

Software downloads — Linux ISOs (Ubuntu via mirror network + Cloudflare), Docker registry pull-through (Hub via Cloudflare, ECR via CloudFront), apt/yum/Homebrew. Public Docker registries serve billions of pulls per day; CDN hit rate ~99% on popular images.

DDoS protection (the inverse use case) — some customers buy CDN service primarily for DDoS absorption, not caching. Cloudflare's core business is arguably this — small sites with low traffic but high attack profile (gaming forums, journalists in repressive regions). Caching is incidental; the value is wire capacity. Project Galileo is the most public example.

Edge compute / personalization at edge — CDN runs code per request to customize the response. Workers serving Stripe Press, Compute@Edge hosting parts of the NYT site, Lambda@Edge resizing images for Shopify storefronts. Cache key extends to "user bucket" (logged-in vs anonymous, A/B cohort, geo) so each bucket is cached but per-user state isn't.

ML model and data artifact distribution — ML model files (10 MB - 10 GB) pulled by inference workers worldwide fit "cacheable immutable blob" perfectly. HuggingFace hosts models on CloudFront. Cloudflare R2 + Workers serve LLM (Large Language Model) weights and embeddings at edge for some customers.

The point: same primitive, very different consumption. A static-image CDN at 99.99% hit rate looks operationally different from a video CDN at 99% hit rate on petabyte working sets, which differs from a DDoS-focused CDN at 30% hit rate on uncacheable traffic. Knowing which shape your workload is determines provider, tier, cache-key strategy, and TTL discipline.


§25. Real-world implementations with numbers

  • Cloudflare: ~20% of web traffic. 280 Tbps capacity. ~300 PoPs, 120+ countries. Sustained ~50M rps; peak crossing 1 billion req/min in DDoS events. Record largest DDoS absorbed: 71M rps in 2023.
  • Akamai: ~30% of web traffic. ~4,200 PoPs, often single-rack inside ISPs. Served 250+ Tbps during major NFL Thursday Night Football events. Enterprise streaming and gaming dominate.
  • Netflix Open Connect: ~17,000 cache appliances in ISPs. ~200 Tbps peak during popular releases. ~95% of streaming from inside the user's own ISP. Each appliance: ~250 TB SSD, 100 Gbps NICs, optimized for sequential 4K reads. World's largest CDN by bytes-moved-per-day, but invisible because it serves one customer.
  • Fastly: ~70 superPOPs with petabyte-class storage. VCL programmability. June 8, 2021 outage — one customer's VCL push hit a latent bug; downed data plane globally for ~49 minutes; took down BBC, Reddit, NYT, Amazon, gov.uk. Recovered in <1 hour. Classic study in CDN blast radius. Purge SLO: <150 ms global p99.
  • CloudFront: ~450 edge locations + ~13 regional edge caches. Theoretical 600+ Tbps. Tightly coupled to AWS origins; free egress to S3/ALB via specific patterns. Powers Twitch catalog, much of Amazon retail's public surface, Netflix's metadata APIs.
  • BunnyCDN: ~120 PoPs, ~$0.01/GB pricing — ~1/8 of CloudFront's list. Indie game patches, blog hosting, small SaaS.
  • Google Global Cache (GGC): in-ISP appliances for YouTube, Drive, Play Store. YouTube delivers >1 Pbps globally during peak; majority via in-ISP layer.

Latency budget for a typical cache-hit request:

DNS resolve (cached at resolver)        :   2-10 ms
TCP handshake to anycast IP             :  10-15 ms (1 RTT)
TLS 1.3 1-RTT handshake (or 0-RTT)      :  10-15 ms (5 ms if resumed)
HTTP/2 request frame                    :  <1 ms
Edge cache lookup (RAM hit)             :  <1 ms
Cache response framing + send           :   1-2 ms
Network back to client                  :  5-15 ms
----------------------------------------------------------
Total p99 cache hit                     :  30-50 ms

On miss (add):
PoP -> L2 mid-tier                      :  1-3 ms
L2 -> shield                            :  5-20 ms
Shield -> origin                        :  50-200 ms (geography)
----------------------------------------------------------
Total p99 cache miss                    : 100-300 ms

The cache-hit budget is dominated by TLS handshake. Halving it requires session resumption (free, widely deployed), 0-RTT, or QUIC's faster handshake. Beyond that, the next bottleneck is mobile network jitter — uncontrollable.


§26. Specialized CDNs

The mainstream CDN landscape (Cloudflare, Akamai, Fastly, CloudFront) handles general web + API + image + video. But several workloads have specialized variants worth knowing about.

(a) Gaming patch and asset distribution

Gaming patches are huge (100 GB for AAA titles), released simultaneously to tens of millions of users in narrow windows ("launch day"). A pure CDN approach works but is expensive; the gaming industry has built hybrids.

  • Steam Content Server Network: Valve's own CDN; ~100+ PoPs globally; ~150 Tbps peak. Augmented by Steam Pipe, which is essentially BitTorrent-with-trackers: clients download chunks from Steam servers and from peers on the same network. ISP-friendly because LAN downloads minimize WAN traffic.
  • Epic Games (Fortnite): Cloudflare CDN for distribution; their Easy Anti-Cheat infrastructure for in-game services. Patches use chunked CDN distribution.
  • Battle.net (Blizzard): in-house CDN + Akamai + their own peer-to-peer client (the "Battle.net update agent" used to torrent patches, since reduced as bandwidth costs dropped).
  • Riot Games: uses Akamai + their own infrastructure; recently moved more to public cloud.
  • Microsoft (Xbox + Game Pass): Xbox Live distribution via Azure CDN + their own in-ISP appliances analogous to Netflix Open Connect.

Pattern: immutable chunks + parallel downloads + peer-assist when LAN peers have what you need. Pure BitTorrent dropped out of favor (NAT complexity, anti-piracy concerns) but the structural ideas persist.

(b) Software downloads — Linux distros, Docker registries

Linux distribution mirrors have been a thing since the 90s. Today:

  • Ubuntu / Debian / Fedora mirrors: traditional volunteer-run mirror networks (universities, companies) plus CDN augmentation. Ubuntu uses Cloudflare for releases.ubuntu.com. Debian has hundreds of mirrors; meta-tools like apt-fast and mirrorbits pick the fastest.
  • Docker Hub: served via Cloudflare (Docker Inc. as Cloudflare customer); aggressive caching at edge since images are immutable by digest.
  • AWS ECR (Elastic Container Registry): served via CloudFront with regional edge caches.
  • GitHub Container Registry: served via Azure / Fastly.
  • Homebrew (macOS package manager): bottles served via Cloudflare and GitHub LFS; archives via the GitHub CDN.
  • PyPI / npm / Maven Central: served via Fastly (PyPI), Cloudflare (npm), CloudFront (Maven Central). Single-package downloads can spike to millions per minute on a popular release (e.g., new React major version).

Pull-through caches deserve mention: a corporate environment might run a registry proxy (e.g., Sonatype Nexus, JFrog Artifactory) that caches from upstream. The proxy is itself a CDN for the corporate network — same primitive, smaller scale.

(c) Live streaming — Twitch, YouTube Live, LL-HLS

Live streaming is the latency-sensitive sibling of VOD streaming. Twitch and YouTube Live target ~3-5 s end-to-end latency from streamer to viewer; sports streaming targets ~10-20 s.

The protocol evolution:

  • RTMP (Real-Time Messaging Protocol): Flash-era protocol; still used for ingest from streamers to CDN, but not for delivery.
  • HLS / DASH with 2-10 s segments: 30+ s of end-to-end latency. Too slow for live.
  • LL-HLS (Low-Latency HTTP Live Streaming): Apple's spec; chunked transfer of partial segments; ~2-5 s latency. CDN handles partial-segment caching.
  • LL-DASH: equivalent for DASH.
  • WebRTC delivery: sub-second latency; some platforms use it for interactive live (Twitch's beta low-latency mode, YouTube's "ultra-low latency" mode). Doesn't use traditional CDN caching — uses SFUs (Selective Forwarding Units) in a media-server architecture.

Twitch's architecture: ingest via RTMP at ingest PoPs; transcoded to multiple bitrates; segmented; distributed via their own CDN (formerly heavily Akamai, now significant in-house). Recent additions of LL-HLS.

YouTube Live: similar pipeline; uses Google's global infrastructure; LL-HLS and ultra-low-latency WebRTC modes.

(d) DNS-only / steering-only CDNs

NS1 and Cedexis were mentioned in §20. They don't cache HTTP at all — they just steer. Their role in the CDN ecosystem:

  • Multi-CDN front end: pick which CDN to route to per query.
  • GeoDNS for legacy architectures (one origin per region; DNS picks the nearest).
  • Failover orchestration: monitor backends; route around failures.

NS1's pricing model: per query + per traffic management feature. For sites with 100M queries/month and multi-CDN, the cost is a few thousand dollars/month — small relative to CDN egress.

(e) Private-cloud / on-prem CDNs

For workloads that can't use public CDN (regulatory, data residency, ultra-low-latency on internal networks):

  • VMware vSAN + edge nodes: enterprise customers deploy mini-PoPs in their own datacenters.
  • Telco edge (5G MEC — Multi-access Edge Computing): mobile carriers offer edge compute and caching co-located with cell towers / base stations. AT&T, Verizon, and other carriers have such offerings.
  • Specialized CDNs for HFT (High-Frequency Trading): where shaving microseconds matters; not really "CDN" in the public sense but the same architecture (geographically-distributed cache + fast routing) for market data.

(f) Specialty: ad-tech and analytics CDNs

Ad networks and analytics platforms have specialized CDN needs:

  • Google Ad Manager / Doubleclick: serves ad creative via Google's CDN; latency budget is single-digit ms.
  • Cloudflare for ad-tech: many ad-tech companies use Cloudflare's edge to lower per-request latency.
  • Snowplow / Segment: analytics SDKs need a CDN to host the tracking JS; per-event collection often via CDN proxy to a backend.

The constraint: ad-tech requires both very low latency (auction time budgets are 100 ms) and flexible programmability (frequent A/B tests, real-time bidding logic). Edge compute fits well; raw cache plane often doesn't.


§27. Cost economics deep dive

CDN cost is opaque until you've signed a contract or three. The cost model has several axes, each with its own pricing curve.

(a) Egress cost (per-GB pricing)

The dominant cost. Pricing in the wild (list prices; enterprise discounts can be 50-90% off):

Provider Region Price per GB (list, 2025) Notes
CloudFront US, Europe $0.085 (first 10 TB) Tiers down to $0.020 at high volume
CloudFront Asia, Australia $0.114 Geographic premium
Cloudflare Most regions Bundled (Pro/Biz plans) Pay flat per month; egress unmetered
Cloudflare Enterprise Custom, often <$0.005/GB Volume contracts
Fastly US, Europe $0.12 (first 10 TB) Cheaper after volume
Akamai Custom Negotiated Typically $0.005-0.02 at scale
BunnyCDN All $0.01 Indie pricing; limited features
Google Cloud US $0.08 (first 10 TB) Tiers down

The trend: egress prices have declined ~50% over the last decade. AWS announced ~free egress to the internet for customers leaving AWS in 2024 (long-overdue concession). Hyperscalers have higher list prices; specialty CDNs (Cloudflare, BunnyCDN) have lower.

Cloudflare's pricing model is unusual: most plans bundle egress (you pay a flat per-month rate for Pro / Business / Enterprise; no separate per-GB charges). This is enabled by their network economics (bandwidth-bundled peering) and is the reason many bandwidth-heavy sites move to Cloudflare even from enterprise CDNs.

(b) Request cost (per-million-requests)

Separate from bytes. Typical:

Provider Per 10K HTTPS requests Notes
CloudFront $0.0100 Plus egress
Cloudflare Bundled Most plans
Fastly $0.0100 Plus egress
BunnyCDN $0.0004 Indie

For sites with high request volume but small response bodies (API CDN), request cost can dominate egress cost. A site doing 1B requests/month at $0.01/10K = $1000/month just in request fees.

(c) Edge compute cost

Compute pricing:

  • Cloudflare Workers: $5/month base + $0.50 per million requests (Bundled plan); $0.30 per million CPU-ms (Unbound plan, beyond first 10M).
  • Fastly Compute@Edge: $50/month base + $0.50 per million requests.
  • AWS Lambda@Edge: $0.60 per million requests + $0.0000050001 per GB-second of duration. Cold starts contribute to billed duration.
  • AWS CloudFront Functions: $0.10 per million requests; very cheap; very restricted (~1 ms CPU budget).

For sites that do 10M edge-compute requests/day: ~$50/month on Workers, ~$60/month on CloudFront Functions, more on Lambda@Edge.

(d) The "static is cheap, dynamic is expensive" math

The economic shape of CDN cost:

  • Static, immutable, content-hashed asset, no edge compute: ~$0.005-0.02/GB egress, ~zero request cost (since bundled), no compute. The cheapest possible.
  • Dynamic-ish, short-TTL, with header logic at edge: above + edge compute per request. Still cheap per request but adds up.
  • Personalized at edge, heavy compute: large edge-compute bills. Often comparable to running the equivalent at origin.
  • Uncacheable (Cache-Control: no-store, or auth-bearing every request): pure pass-through; you pay both edge bandwidth and origin compute. The worst case.

The optimization pattern: make as much cacheable as possible. Edge compute to convert dynamic to cacheable (e.g., GraphQL persisted queries, A/B bucket assignment for cacheable bucket variants) often pays for itself in offloaded origin compute.

(e) Free tiers

Major CDN free tiers:

  • Cloudflare Free: free TLS, DDoS protection, basic WAF, unlimited bandwidth. The most generous free tier; the reason many indie sites use Cloudflare.
  • CloudFront Free Tier: 1 TB egress, 10M requests/month for 12 months on AWS Free Tier.
  • Netlify, Vercel: edge serving included with hosting; limits on bandwidth per month.
  • BunnyCDN: no free tier, but $1/month minimum is close enough.
  • GitHub Pages: free hosting on GitHub's Fastly-fronted CDN.

Free tiers have limits — DDoS protection on Cloudflare Free is unlimited L3/L4 but with rate-limited L7 features; CloudFront free tier is single-year. For a side project, Cloudflare Free + Vercel covers almost everything.

(f) Cost surprises

Common ways customers blow their CDN budget:

  • Cache key fragmentation collapses hit rate. A bad config drops hit rate from 99% to 50%; egress + request volume to origin spikes 50x; the CDN bill grows because more "miss" responses pass through (egress) and origin costs explode.
  • Egress to other regions. Multi-region serving with cross-region replication can add unexpected egress between hyperscaler regions; some CDN providers charge premium for serving from certain geographies (Asia, South America).
  • Premium feature creep. Adding bot management ($50/month), image optimization (~$0.10/1000 transformations), Workers (~$5/month per service) accumulates. Many enterprises end up paying 5-10x their bandwidth bill in features.
  • Negotiated rate vs list rate. Until you have a contract, you pay list. An enterprise commitment ($10K/month minimum) usually unlocks 50-80% discount.

§28. Observability

Running a CDN integration without observability is flying blind. Modern CDN providers offer rich telemetry; using it well is half of operating a CDN-fronted service.

(a) Log streaming from edge

The basic primitive: every request at every PoP can be logged, and those logs can be streamed to your S3 / GCS / Splunk / Datadog. Implementations:

  • Cloudflare Logpush: stream HTTP request logs (and Workers logs, firewall logs, etc.) to S3 / GCS / Azure / Splunk / Datadog. Sub-minute lag from request to log.
  • Fastly Real-Time Log Streaming: HTTP / TCP / syslog endpoints; log every request as it happens. Sub-second lag possible.
  • CloudFront access logs: written to S3 with a few-minute lag. Real-time logs (CloudFront Real-time Logs) available with Kinesis Data Streams, sub-second lag.
  • Akamai DataStream: similar streaming.

Log volume can be huge: a site with 10M rps generates ~10M log lines per second; at ~500 bytes per line that's 5 GB/s. Most customers sample (1% of requests) or aggregate at edge.

(b) The per-PoP visibility problem

Aggregate metrics across all PoPs hide PoP-specific failures. A PoP serving 1% of global traffic could be entirely failing (errors, high latency, bad cache hit rate) and the global metric would barely move.

Mitigation: per-PoP dashboards. Cloudflare's analytics is per-PoP (or "colo" in their language); Fastly's per-POP analytics is similar; CloudFront's per-edge-location analytics is more limited.

Common per-PoP anomalies: - Hit rate drops on one PoP — suggests cache eviction storm or local hardware issue. - p99 latency spikes on one PoP — suggests local network saturation or hot spot. - Error rate climbs on one PoP — suggests problem with that PoP's origin connectivity.

Alerting on per-PoP anomalies (when one PoP deviates from the global mean by N standard deviations) catches issues that global aggregates miss.

(c) Real User Monitoring (RUM)

CDN-measured metrics are server-side: how long did the CDN take to serve? Real user latency includes the network path back to the client (often dominant on mobile), browser rendering time, page load time. RUM captures this:

  • Cloudflare Web Analytics: privacy-friendly RUM via a tiny JS beacon.
  • Akamai mPulse: enterprise RUM, very comprehensive.
  • Google Analytics / GA4 with Core Web Vitals: not CDN-specific but commonly used.
  • SpeedCurve, New Relic Browser, DataDog RUM: third-party RUM products.

What RUM reveals: - Real user p99 latency (vs server-side p99, which only includes CDN-side time). - Conversion impact of latency (each 100 ms increase in p99 → N% conversion drop). - Per-region performance variance: one region's users see much higher latency, suggesting routing issues, peering issues, or local network problems. - Mobile vs desktop: mobile users see ~3x higher p99 typically. - CDN provider comparison: in multi-CDN, RUM data shows which CDN performs better in each region — feeding back into routing decisions.

(d) Synthetic monitoring

Complement to RUM: continuously probe the site from many vantage points; alert on failures or latency degradations. Vendors:

  • Catchpoint, ThousandEyes, Pingdom, Uptrends: synthetic monitoring services with many probe locations.
  • Datadog Synthetic Monitoring: integrated with their observability stack.
  • AWS CloudWatch Synthetics: AWS-native.

What synthetic catches that RUM doesn't: - Outages during low traffic (RUM only fires when users hit the site). - Specific user-flow correctness (synthetic can simulate a complex multi-step flow; RUM is per-page). - Specific PoP / region health (synthetic probes specific PoPs; RUM is whatever users do).

(e) CDN-side metrics

Beyond logs, CDN providers expose aggregate metrics:

  • Hit rate: by host, path, region, time. The most important single metric.
  • Bandwidth out: by host, region, content-type.
  • Request count: by status code, by path.
  • Error rate: 4xx, 5xx breakdown.
  • Origin response time: how long does the shield wait for origin.
  • DDoS attack metrics: traffic / requests blocked, attack types observed.
  • WAF rule hits: which rules are triggering, on which paths.

Cloudflare Analytics, Fastly Insights, CloudFront CloudWatch metrics, Akamai mPulse — all expose these. They're typically the first dashboard a CDN operator looks at.

(f) Tracing across CDN and origin

A request that crosses CDN → shield → origin should be traceable end-to-end. Patterns:

  • X-Request-ID header propagation: CDN injects a request ID; origin logs include it; client gets it in response (optionally). Cross-correlation possible.
  • W3C Trace Context (traceparent header): standardized distributed tracing context. CDN propagates from client to origin (or generates if absent).
  • Cloudflare's cf-ray header: every Cloudflare request has a unique ray ID; logs include it; customer can correlate with their origin logs.

For complex incidents (slow request, intermittent error), tracing across CDN + origin + downstream services is essential. Without trace propagation, the CDN logs and origin logs are independent and you must guess the correlation.

(g) The "shadow traffic" pattern

Some operators send a copy of production traffic to a shadow origin to validate changes: - CDN Worker forks the request; sends to both real origin and shadow origin. - Returns the real origin's response to the user. - Compares the shadow's response asynchronously; alerts on differences.

Used for canary deployments of new origin versions, regression testing, A/B comparisons. Cheap to implement at edge.


§29. Summary

A CDN is a planet-scale distributed read-through cache fronted by anycast routing: each PoP runs a log-structured object store with W-TinyLFU admission, sharded by consistent hashing within the PoP, tiered through a regional mid-cache into an origin shield that collapses misses so origin sees single-digit RPS even at 10M rps at the edge. The technology contract is hit-rate-driven origin offload + topological proximity + DDoS dilution; durability, strong consistency, schema awareness, and per-user privacy isolation must be layered above. Whether the workload is static assets, API responses, video segments, gaming patches, software downloads, or ML model files, the same primitive applies; the differences live in cache-key hygiene, TTL discipline, and which corner of the design space — third-party shared, private in-ISP, programmable edge, specialized media — you choose.