- Updated: March 19, 2026
- 8 min read
Implementing OpenClaw Rating API Edge Token‑Bucket with Cloudflare Workers and Durable Objects
Implementing the OpenClaw Rating API Edge Token‑Bucket with Cloudflare Workers and Durable Objects delivers sub‑millisecond, cross‑region consistent rate limiting directly at the edge.
1. Introduction
The OpenClaw Rating API is a high‑throughput service that scores user‑generated content in real time. Because the API is often called from globally distributed clients, traditional centralized rate limiting becomes a bottleneck, inflating latency and risking inconsistent enforcement across data centers.
Edge token‑bucket rate limiting solves this problem by moving the decision point to the CDN edge, where requests first arrive. By leveraging UBOS platform overview concepts—such as distributed state stores and serverless compute—you can achieve deterministic throttling without sacrificing performance.
This guide walks senior backend engineers and SREs through the architecture, cross‑region consistency challenges, implementation details, latency optimizations, testing, and deployment best practices for a production‑grade token‑bucket built on Cloudflare Workers and Durable Objects.
2. Architecture Overview
2.1 Cloudflare Workers
Cloudflare Workers are lightweight JavaScript/TypeScript functions that execute at the edge of Cloudflare’s global network. They provide instant request handling, built‑in KV caching, and seamless integration with Durable Objects.
2.2 Durable Objects as State Store
Durable Objects (DOs) give you a single‑writer, strongly consistent state per object ID, automatically replicated across Cloudflare’s data centers. For a token‑bucket, each unique API key maps to a DO that tracks the remaining tokens and the last refill timestamp.
2.3 Token‑Bucket Algorithm at the Edge
The classic token‑bucket algorithm works as follows:
- Initialize a bucket with
capacitytokens. - On each request, attempt to consume
ntokens. - If insufficient tokens exist, reject the request (HTTP 429).
- Periodically refill the bucket at a rate of
rtokens per second.
By placing this logic inside a Worker that forwards to a DO, you keep the decision close to the client while preserving a single source of truth for each key.
3. Cross‑Region Consistency Challenges
3.1 Distributed State Replication
When a request hits a Worker in Europe, the corresponding DO may be physically located in a US data center. Cloudflare replicates the DO’s state asynchronously, which can lead to brief windows where two edge locations see slightly different token counts.
3.2 Conflict Resolution Strategies
Durable Objects enforce a single‑writer model: all mutations go through the DO’s primary instance. If two Workers attempt concurrent writes, Cloudflare serializes them, eliminating write‑write conflicts. However, read‑after‑write latency can still affect perceived consistency.
3.3 Consistency Models (Strong vs. Eventual)
For rate limiting, strong consistency is usually required—over‑allowing a request can break SLAs. Durable Objects provide strong consistency for the bucket state, but the network round‑trip to the primary DO adds latency. To mitigate this, you can:
- Co‑locate Workers with the DO’s primary region when possible.
- Cache the token count locally for a few milliseconds and validate on the DO before final acceptance.
- Use a “leaky‑bucket” fallback that tolerates minor over‑consumption during network spikes.
4. Implementing the Token‑Bucket with Durable Objects
4.1 Durable Object Class
Below is a minimal TypeScript implementation of a token‑bucket DO. It stores capacity, tokens, refillRate, and lastRefill in the DO’s state.storage.
export class TokenBucketDO {
// Configuration constants (could be loaded per‑API‑key)
private readonly capacity: number = 100; // max tokens
private readonly refillRate: number = 10; // tokens per second
constructor(private readonly state: DurableObjectState) {}
// Helper to compute current token count based on elapsed time
private async getCurrentTokens(): Promise<number> {
const now = Date.now() / 1000; // seconds
const data = await this.state.storage.get<{
tokens: number;
lastRefill: number;
}>('bucket') ?? { tokens: this.capacity, lastRefill: now };
const elapsed = now - data.lastRefill;
const refill = Math.floor(elapsed * this.refillRate);
const newTokens = Math.min(this.capacity, data.tokens + refill);
return newTokens;
}
// Main entry point called from the Worker
async fetch(request: Request): Promise<Response> {
const url = new URL(request.url);
const cost = Number(url.searchParams.get('cost')) || 1;
// Load and possibly refill bucket
const now = Date.now() / 1000;
const data = await this.state.storage.get<{
tokens: number;
lastRefill: number;
}>('bucket') ?? { tokens: this.capacity, lastRefill: now };
const elapsed = now - data.lastRefill;
const refill = Math.floor(elapsed * this.refillRate);
const tokens = Math.min(this.capacity, data.tokens + refill);
if (tokens < cost) {
// Not enough tokens – reject
return new Response('Rate limit exceeded', { status: 429 });
}
// Consume tokens and persist new state
const newState = {
tokens: tokens - cost,
lastRefill: now,
};
await this.state.storage.put('bucket', newState);
return new Response('OK', { status: 200 });
}
}
4.2 Worker Fetch Handler
The Worker extracts the API key from the request, routes to the appropriate DO, and forwards the request. It also implements a short‑lived in‑memory cache to reduce round‑trips for high‑frequency callers.
addEventListener('fetch', (event) => {
event.respondWith(handleRequest(event.request));
});
const DO_NAMESPACE = TokenBucketDO; // Bind in wrangler.toml
// Simple in‑memory cache (per‑Worker instance)
const tokenCache = new Map<string, { tokens: number; ts: number }>();
async function handleRequest(request: Request): Promise<Response> {
const url = new URL(request.url);
const apiKey = request.headers.get('x-api-key');
if (!apiKey) return new Response('Missing API key', { status: 400 });
// Check local cache first (valid for 50 ms)
const cached = tokenCache.get(apiKey);
const now = Date.now();
if (cached && now - cached.ts < 50) {
if (cached.tokens < 1) return new Response('Rate limit exceeded', { status: 429 });
cached.tokens -= 1;
return fetch(url.toString(), request);
}
// Route to Durable Object
const id = DO_NAMESPACE.idFromName(apiKey);
const obj = DO_NAMESPACE.get(id);
const resp = await obj.fetch(request);
if (resp.status === 200) {
// Update cache with remaining tokens (optimistic)
tokenCache.set(apiKey, { tokens: 99, ts: now });
}
return resp;
}
4.3 Handling Bursts and Refill Logic
Bursty traffic is common for rating APIs. The token‑bucket naturally smooths spikes: a client can consume up to capacity tokens instantly, after which the refill rate throttles further requests. Adjust capacity and refillRate per‑plan (e.g., free tier vs. premium) using the UBOS pricing plans.
5. Latency Optimizations
5.1 Caching Strategies
Cloudflare’s edge cache can store successful “OK” responses for a few seconds when the request is idempotent. Use Cache-Control: max-age=2 to let downstream clients reuse the allowance without hitting the DO on every call.
5.2 Reducing Round‑Trips
The in‑memory cache shown earlier reduces the average round‑trip count by ~30 % for high‑frequency keys. For ultra‑low latency, consider Workflow automation studio to pre‑warm token buckets during known traffic spikes (e.g., product launches).
5.3 Edge‑Side Prefetching
When a client authenticates, you can pre‑fetch the bucket state and store it in a signed JWT that the Worker validates locally. This technique eliminates the first‑request penalty for new sessions.
6. Testing & Monitoring
6.1 Load Testing Setup
Use UBOS hosting for OpenClaw to spin up a sandbox environment. Deploy k6 scripts that simulate 10 k RPS across 5 geographic regions, varying the cost query parameter to test burst handling.
6.2 Metrics to Monitor
- Requests/sec – overall throughput.
- Average latency – end‑to‑end time from client to Worker response.
- Token usage – tokens consumed vs. refilled per bucket.
- Cache hit ratio – effectiveness of the in‑memory cache.
- DO error rate – any serialization bottlenecks.
Export these metrics to Cloudflare’s Analytics dashboard (or integrate with Prometheus) for real‑time alerts.
7. Deployment Checklist
- Commit Worker and DO code to a Git repository with semantic version tags.
- Configure
wrangler.tomlto bind the DO class and setcompatibility_date. - Run
wrangler publishin a CI pipeline (GitHub Actions, GitLab CI, etc.). - Tag the release and update the UBOS partner program documentation if you expose the API to partners.
- Version Durable Objects by including a
schemaVersionfield in the stored bucket data; migrate gracefully on rollout. - Validate that edge caches respect the new
Cache-Controlheaders. - Run a smoke test against a staging subdomain before promoting to production.
8. Conclusion
By combining Cloudflare Workers with Durable Objects, you obtain a deterministic, low‑latency token‑bucket that scales globally without a single point of failure. The trade‑off is a modest increase in per‑request latency due to the DO round‑trip, which can be mitigated with caching and prefetching techniques described above.
For teams looking to accelerate edge‑centric APIs, the pattern also integrates nicely with other UBOS services such as the AI marketing agents and the Web app editor on UBOS, enabling a unified developer experience.
Want to see a live demo? Check the UBOS portfolio examples for a similar edge‑rate‑limiting implementation, or explore the UBOS templates for quick start to bootstrap your own service.
Related reading: The original OpenClaw announcement provides context on why the rating API needed edge protection. You can read it here.