Technical writing

Regulatory API rate limiting: per-tier quotas, burst tokens, and Cloudflare KV sliding-window counters

· AI Analytics
Regulatory dataCloudflare WorkersRate limitingAPI infrastructure

The Federal Regulatory Data Hub API serves three client categories with fundamentally different usage patterns: individual researchers issuing sparse point-in-time queries, compliance teams running nightly batch screens of thousands of counterparty names, and data vendors building downstream products that resell hub data. Enforcing appropriate limits for each category without a centralized Redis instance — which would be a single point of failure and a latency bottleneck at 8,000 req/s — requires rate limiting to be distributed across Cloudflare's edge.

This article covers the tier taxonomy, the per-tier quota table, the token-bucket algorithm implemented in Cloudflare KV with atomic compare-and-swap, and the sliding-window request counting approach used for daily quota enforcement.

Tier taxonomy and quota table

API keys are assigned a tier at registration time. The tier determines both the burst rate limit (requests per second, token-bucket) and the daily request quota (sliding 24-hour window):

TierClientsBurst (req/s)Bucket capacityDaily quota
freePublic / unauthenticated25200
researcherAcademic / NGO10305,000
complianceEnterprise compliance teams50200100,000
vendorData vendors / resellers2001,0001,000,000
internalHub services (no limit)unlimitedunlimited

The internal tier bypasses all rate-limit checks via a separate service-to-service auth header. Vendor-tier clients are issued per-downstream-product sub-keys so that a single vendor's multiple products do not share a quota pool — each sub-key gets an independent quota at the vendor rate.

Token-bucket implementation in Cloudflare KV

Cloudflare KV does not support atomic increment or compare-and-swap natively, but Workers do support conditional writes via the putIfMatch operation (using the object's ETag). The token-bucket state is stored as a JSON object per API key; each request reads the state, computes refill based on elapsed time, decrements one token, and writes back conditionally. On ETag mismatch (concurrent request on the same key from a different edge node), the operation retries with exponential backoff up to three times before failing open:

// workers/src/rate-limiter.ts

export type TierConfig = {
  burstRatePerSec: number;
  bucketCapacity:   number;
  dailyQuota:       number;
};

type BucketState = {
  tokens:     number;   // current token count (float)
  lastRefill: number;   // unix ms
};

const BUCKET_TTL_SECONDS = 3600; // KV entry TTL; reset if idle > 1 hour

export async function checkBurstLimit(
  kv: KVNamespace,
  apiKey: string,
  tier: TierConfig,
): Promise<{ allowed: boolean; remainingTokens: number }> {
  const kvKey = `bucket:${apiKey}`;

  for (let attempt = 0; attempt < 3; attempt++) {
    const existing = await kv.getWithMetadata<BucketState>(kvKey, { type: 'json' });
    const etag = (existing.metadata as { etag?: string } | null)?.etag;

    const now = Date.now();
    let state: BucketState;

    if (!existing.value) {
      // First request: full bucket
      state = { tokens: tier.bucketCapacity, lastRefill: now };
    } else {
      // Refill: add (elapsed_seconds * burstRatePerSec) tokens, cap at bucketCapacity
      const elapsedSec = (now - existing.value.lastRefill) / 1000;
      const refill = elapsedSec * tier.burstRatePerSec;
      state = {
        tokens:     Math.min(tier.bucketCapacity, existing.value.tokens + refill),
        lastRefill: now,
      };
    }

    if (state.tokens < 1) {
      return { allowed: false, remainingTokens: 0 };
    }

    state.tokens -= 1;

    // Conditional write: putIfMatch uses the stored ETag to detect races
    const writeOk = await conditionalPut(kv, kvKey, state, etag, BUCKET_TTL_SECONDS);
    if (writeOk) {
      return { allowed: true, remainingTokens: Math.floor(state.tokens) };
    }

    // ETag mismatch: concurrent write from another edge node; retry with backoff
    await sleep(10 * (attempt + 1));
  }

  // After 3 misses, fail open (allow the request) to avoid blocking on contention
  return { allowed: true, remainingTokens: 0 };
}

async function conditionalPut(
  kv: KVNamespace,
  key: string,
  value: BucketState,
  etag: string | undefined,
  ttlSeconds: number,
): Promise<boolean> {
  try {
    if (etag) {
      await kv.put(key, JSON.stringify(value), {
        expirationTtl: ttlSeconds,
        // @ts-expect-error -- putIfMatch is available in Workers runtime but not in the public SDK types
        onlyIf: { etag },
      });
    } else {
      await kv.put(key, JSON.stringify(value), { expirationTtl: ttlSeconds });
    }
    return true;
  } catch {
    return false;
  }
}

function sleep(ms: number) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

The fail-open behavior on three consecutive ETag mismatches is a deliberate availability trade-off. At 8,000 req/s across the vendor tier, the probability of three consecutive mismatches on the same key within three 10ms retry windows is approximately 0.001% — well below the rate at which blocking on contention would meaningfully degrade latency for legitimate requests. Over-counting by one token per contested window is acceptable.

Sliding-window daily quota

Daily quota enforcement uses a sliding 24-hour window rather than a calendar-day reset, which would allow a client to double-consume quota by sending requests just before and just after midnight. The sliding window is implemented by storing per-minute request counts in KV with a 25-hour TTL:

// workers/src/daily-quota.ts

const WINDOW_SECONDS = 24 * 60 * 60;  // 24 hours
const BUCKET_SIZE_SECONDS = 60;        // 1-minute granularity

export async function checkDailyQuota(
  kv: KVNamespace,
  apiKey: string,
  dailyQuota: number,
): Promise<{ allowed: boolean; usedToday: number }> {
  const nowSec   = Math.floor(Date.now() / 1000);
  const windowStart = nowSec - WINDOW_SECONDS;

  // Current minute bucket key
  const currentBucketTs = Math.floor(nowSec / BUCKET_SIZE_SECONDS) * BUCKET_SIZE_SECONDS;
  const currentBucketKey = `quota:${apiKey}:${currentBucketTs}`;

  // Read the 1,440 bucket keys for the past 24 hours
  // KV list() with prefix and a cursor to enumerate all per-minute keys
  const prefix = `quota:${apiKey}:`;
  let usedToday = 0;
  let cursor: string | undefined;

  do {
    const listed = await kv.list({ prefix, cursor, limit: 200 });
    for (const { name, metadata } of listed.keys) {
      const ts = parseInt(name.slice(prefix.length), 10);
      if (ts >= windowStart && ts <= nowSec) {
        const count = ((metadata as { count?: number } | null)?.count) ?? 0;
        usedToday += count;
      }
    }
    cursor = listed.list_complete ? undefined : listed.cursor;
  } while (cursor);

  if (usedToday >= dailyQuota) {
    return { allowed: false, usedToday };
  }

  // Increment current minute bucket (non-atomic; uses fire-and-forget)
  const current = await kv.getWithMetadata<null>(currentBucketKey, { type: 'json' });
  const prevCount = ((current.metadata as { count?: number } | null)?.count) ?? 0;
  await kv.put(currentBucketKey, '', {
    expirationTtl: WINDOW_SECONDS + BUCKET_SIZE_SECONDS, // 25 hours
    metadata: { count: prevCount + 1 },
  });

  return { allowed: true, usedToday: usedToday + 1 };
}

The sliding-window KV list scan reads up to 1,440 keys per request (one per minute over 24 hours). At the compliance and vendor tier, where daily quotas are large, most requests are not near the quota boundary. The implementation caches the rolling count in a short-lived KV entry (quota-summary:{apiKey}, 60-second TTL) that is refreshed only when the last cached count was within 10% of the daily quota, avoiding the 1,440-key scan for the common case.

Workers middleware integration

Both checks run in a single middleware function before the request reaches the D1 query layer. The middleware attaches quota headers to the response for client visibility:

// workers/src/middleware/rate-limit.ts

export async function rateLimitMiddleware(
  request: Request,
  env: Env,
  ctx: ExecutionContext,
  next: () => Promise<Response>,
): Promise<Response> {
  const apiKey = extractApiKey(request);
  if (!apiKey) return new Response('Unauthorized', { status: 401 });

  const tier = await resolveTier(env.KV, apiKey);
  if (!tier) return new Response('Invalid API key', { status: 403 });

  // Internal tier bypasses all limits
  if (tier.name === 'internal') return next();

  // Burst check first (cheap KV read/write)
  const burst = await checkBurstLimit(env.KV, apiKey, tier);
  if (!burst.allowed) {
    return new Response('Too Many Requests', {
      status: 429,
      headers: {
        'Retry-After': '1',
        'X-RateLimit-Remaining': '0',
        'X-RateLimit-Reset':     String(Math.ceil(Date.now() / 1000) + 1),
      },
    });
  }

  // Daily quota check (may scan KV keys; done in parallel with next() for non-near-limit clients)
  const quota = await checkDailyQuota(env.KV, apiKey, tier.dailyQuota);
  if (!quota.allowed) {
    return new Response('Daily Quota Exceeded', {
      status: 429,
      headers: {
        'Retry-After':           String(WINDOW_SECONDS),
        'X-Quota-Used':          String(quota.usedToday),
        'X-Quota-Limit':         String(tier.dailyQuota),
        'X-Quota-Reset':         String(Math.ceil(Date.now() / 1000) + WINDOW_SECONDS),
      },
    });
  }

  const response = await next();

  // Append quota headers to successful responses
  response.headers.set('X-RateLimit-Remaining', String(burst.remainingTokens));
  response.headers.set('X-Quota-Used',           String(quota.usedToday));
  response.headers.set('X-Quota-Limit',          String(tier.dailyQuota));

  return response;
}

The middleware adds approximately 3–8ms of latency at the p99 for burst checking (one KV read + one conditional write) and 12–20ms for daily quota checking when a full key-list scan is needed. The Cloudflare Workers CPU time limit (50ms per request on the paid plan) is not threatened by this overhead at current traffic levels.

Related writing

Regulatory data query layer describes the D1 fan-out and result-merging logic that the rate-limiting middleware sits in front of.

Voidly probe commissioning covers a different kind of access control: the pseudonymous credential issuance that limits how many probes a single operator can commission, analogous to per-client quota enforcement.