How We Achieve Sub-100ms Latency Across 15+ Blockchains
Delivering fast, reliable blockchain data at scale requires more than just proxying requests. Here's how we engineered Tokra's infrastructure to consistently hit sub-100ms response times.
Jan 12, 2026
5 min
When we started building Tokra, our North Star metric was clear: every API request should return in under 100ms, regardless of which blockchain you're querying or where you're querying from.
Here's how we made it happen.
Multi-Region Node Distribution
We run our own validator and archive nodes across 12 global regions: US East, US West, Europe (3 zones), Asia Pacific (4 zones), and South America (2 zones). When you make a request, our edge router automatically directs it to the closest healthy node cluster.
This geographic distribution alone cuts latency by 40-60ms for most users compared to single-region setups.
Intelligent Caching Layer
About 60% of blockchain queries are repetitive—checking the same contract state, fetching block headers, or reading token balances. We built a distributed caching system that stores frequently accessed data with chain-specific TTLs:
Block data: 500ms cache
Transaction receipts: 2 second cache
Contract state (for immutable data): 1 hour cache
Our cache hit rate averages 73%, which means most requests never touch an RPC node.
Request Routing Algorithm
Not all nodes are created equal. Network congestion, sync status, and hardware variations create performance inconsistencies. Our routing layer runs health checks every 3 seconds, measuring:
Response latency (p50, p95, p99)
Error rates
Sync status
Requests are dynamically routed to the best-performing node at that moment. If a node degrades, traffic shifts within seconds.
Load Balancing Strategy
We use a weighted round-robin algorithm that accounts for both geographic proximity and real-time performance. Nodes under heavy load receive fewer requests until they recover.
The Results
Median latency: 47ms P95 latency: 89ms P99 latency: 142ms Cache hit rate: 73%
Our monitoring shows that 94% of requests complete in under 100ms. The remaining 6% are typically complex archive queries that unavoidably take longer.
Building fast infrastructure is an ongoing process. We're currently testing a new prediction algorithm that prefetches likely-needed data before requests arrive. Early tests show another 15ms reduction in latency.
Article written by
Sarah Chen


