Everyone wants their storage system to be fast. Traditional storage systems store data on
disks, which is often the performance bottleneck. With the continuous increase in DRAM
size and prevalence of NoSQL database, in-memory key-value storage has grown increas-
ingly attractive. This work explores the design and implementation of high performance
in-memory key-value stores. We assume data fits in memory; however, we log operations
on disk to provide durability in case of power failure.
This dissertation is driven by two questions concerning in-memory key-value stores.
First, what is the most performance we can achieve on a single machine? Modern servers
are equipped with multi-core processor(s), multi-channel DRAM, and new hardware and
instructions to manipulate in-processor cache and DRAM. To achieve high performance on
this architecture, the designer must identify what hardware and software resource become
bottlenecks as the number of cores increases, and either eliminate them or ensure that they
are used efficiently.
Second, can we achieve high performance when the data is replicated among multiple
machines? Replication is important to provide high availability. Traditional replication
protocols write disk synchronously to maintain protocol invariants across crash and restarts;
these disk writes are often the limiting performance factor.
This dissertation presents two works to explore these questions. Masstree is a high
performance key-value store on a single multi-core server. Masstree stores all data in-
memory. It uses several cache-crafty techniques to achieve high performance, including
a balanced tree (to bound tree levels across various workloads), a trie-like concatenation
of trees (to reduce cache line accesses for workloads with long prefixes), and software
prefetch (to hide DRAM latency). Masstree achieves millions of queries per second on a
16-core server, which is more than 30x as fast as MongoDB or VoltDB.
Lazy VSR is a novel replication protocol that offers good performance by avoiding disk
writes in the critical path. Lazy VSR replies to clients before logging the request on the
disk. Lazy VSR achieves this with strong recoverability- it can recover as long as f + 1
out of 2f + 1 replicas revive with disks intact. The trade-off is that, if more than f + 1
replicas crash simultaneously, the system may lose recent acknowledged operations. This
thesis shows that this is not a problem for applications such as file systems. Evaluation
shows that Masstree achieves 50,000 1024-byte updates per second on three replicas. This
is 1.7x the performance of ZooKeeper and 3.6x the performance of MongoDB.
Thesis Supervisor: Prof. Robert Morris