Why is Redis so fast?

System-design interviews — examiners expect you to explain why you'd reach for Redis over Memcached or a relational cache. Saying "it's in-memory" is table stakes; explaining the single-threaded event loop and O(1) data structures scores points.

On-call — Redis latency spikes most often trace to one of three causes: a blocking command (KEYS *, SMEMBERS on a huge set, or a Lua script) hogging the single thread; memory pressure forcing the eviction policy to run on every write; or a single large value serialised over a slow NIC. Knowing the architecture tells you where to look first.

Real systems — Twitter used Redis sorted sets for timelines (ranked by tweet ID). GitHub uses it for rate-limiting counters. Sidekiq stores its job queue in Redis lists. Stack Overflow's tag engine is backed by sorted sets. In every case the choice was driven by a specific data structure, not just "caching".

1. RAM eliminates the seek tax

A spinning-disk random read takes ~5–10 ms. An NVMe SSD gets that down to ~100 µs. A DRAM access is ~100 ns — three to five orders of magnitude faster. Redis keeps its entire dataset in RAM by design. There is no buffer pool, no page cache negotiation, no read-ahead. When you issue GET foo, the value is already in memory; the only variable latency is how fast the OS can copy bytes from kernel to userspace and write them to a socket.

RDB snapshots and AOF persistence exist but are asynchronous by default (appendfsync everysec). The fast path never blocks on disk I/O.

2. Single-threaded event loop eliminates lock contention

Redis processes commands on one thread using an I/O-multiplexed event loop (built on epoll on Linux, kqueue on BSD/macOS). Every command runs to completion before the next one starts — there are no mutexes, no read-write locks, no condition variables around the data structures.

This sounds like a bottleneck, but contention is usually more expensive than serialisation. A multi-threaded cache with fine-grained locks pays cache-coherence traffic between CPU cores, lock-acquire overhead, and the occasional convoy effect. Redis avoids all of that. The throughput ceiling for a single thread on modern hardware is roughly 100 000–1 000 000 ops/sec depending on operation size — higher than most applications need from a single instance.

Redis 6 added I/O threading: multiple threads read from and write to sockets, but command execution itself is still single-threaded. This separation means network serialisation no longer caps throughput on high-bandwidth hardware without reintroducing data-structure locking.

3. Purpose-built data structures avoid wasted work

Redis doesn't store generic blobs with a key. It ships with strings, lists, hashes, sets, sorted sets, bitmaps, HyperLogLogs, streams, and more — each tuned for a specific access pattern.

Sorted sets are backed by a skip list (for range queries) and a hash table (for O(1) point lookups). That dual structure means ZRANGEBYSCORE and ZSCORE are both fast without compromise.

Lists are doubly-linked lists for O(1) head/tail operations — perfect for queues and stacks.

Small hashes and small sets use a compact listpack encoding (formerly ziplist) when they're below a configurable size threshold. A listpack is a contiguous byte array: no pointer chasing, excellent CPU cache locality. Once the collection grows past the threshold, Redis promotes it to a proper hash table or skip list. You get compact storage for small collections automatically.

Strings are Simple Dynamic Strings (SDS) — a length-prefixed byte array rather than null-terminated C strings. STRLEN is O(1) because the length is stored, not computed. Appending is amortised O(1) because SDS tracks available capacity.

4. I/O multiplexing handles thousands of connections cheaply

epoll (Linux) lets the kernel notify Redis of which file descriptors are readable/writable in a single syscall, regardless of how many connections are open. The cost of idle connections is near zero. Contrast this with the old select() model, which scanned every fd on every call, or a thread-per-connection model, where 10 000 connections means 10 000 stacks sitting in memory.

Pipelining compounds this: a client can send many commands in one TCP segment and read all responses together, collapsing round-trip time to a single RTT and amortising syscall overhead across the batch.

5. Protocol simplicity reduces parsing overhead

RESP (Redis Serialisation Protocol) is line-oriented and prefix-length-encoded. There's no XML/JSON parsing, no schema negotiation, no compression negotiation on the hot path. A server reading *2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n can determine command and argument lengths with a handful of pointer increments. This is boring engineering — which is exactly why it works.

6. Memory allocator tuning

Redis ships with jemalloc by default. jemalloc uses per-thread arenas and size-class buckets to reduce fragmentation and contention with the OS allocator. Memory fragmentation in Redis is tracked via INFO memory (mem_fragmentation_ratio). A ratio significantly above 1.0 means Redis is holding more RSS than its own bookkeeping says it needs — a sign that the allocator has fragmented, which can waste tens or hundreds of MB and slow allocation paths.

Putting it together

The speed is not one trick. It is the compounding of: no disk seek (RAM), no lock wait (single-threaded command execution), no pointer chasing for small collections (listpack), no idle-connection overhead (epoll), and no heavy serialisation (RESP). Remove any one of these and Redis is still fast. All together, they produce sub-millisecond p99 latency at hundreds of thousands of ops/sec on commodity hardware.