/images/head.jpg

Key Value Workloads

YCSB Workload Description Example Operation breakdown A Update heavy workload session store recording recent actions read: 50%, update: 50% B read mostly workload photo tagging; add a tag is an update, but most operations are to read tags read: 95%, update: 5% C read only user profile cache, where profiles are constructed elsewhere )e.g., Hadoop) read: 100% D read latest workload user status updates; people want to read the latest read: 95%, insert 5% E short ranges threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id) scan: 95% (maxscanlength=100), insert: 5 F read-modify-write workload user database, where user records are read and modified by the user or to record user activity.

Key Value Systems

LSM-tree survey WiscKey: key/value separation HashKV: store cold keys separately. LSM-trie PebblesDB KVell Skip-Tree: put kv items directly to non-adjacent layers (skipping some intermediate layers) to reduce write amplification from level-to-level data compaction. TRIAD: separate hot keys from cold keys. hot keys are stored in memtable for as long as possible. Use multiple memtables and flush them at once to disk. Pipelined Compaction Procedure: A compaction contains three phases: read phase, merge-sort phase, and write phase.

Idea Repository

List of ideas to explore. Interaction between a filesystem and a computational SSD that does transparent compression. Problem: How to manage SSD space in this case? What API should be use? Can use SSDSim to explore the design space. Key prefix compression in k/v store Problem: in levelDB, it uses fixed-size key prefix compression. Can we do better than this? In Rockset/Foundationdb, secondary indexes are stored as <key, null> key/value pairs.

Use Prefix Operators (++i) vs. Postfix Operators (i++)

The ``C++ Primer’’ book has a very interesting recommendation to use postfix operators, only when necessary (on page 148 for the 5th version). The authors shared the following in their book. The prefix version avoids unnecessary work. It increments the value and returns the incremented version. The postfix operator must store the original value so that it can return the unincremented value as its result." For ints and pointers, the compiler can optimize away this extra work.

Crash Recovery

The following content is selected from the Database Management Systems book, chapter 18. ARIES Three main principles lie behind the ARIES recovery algorithm: Write-Ahead Logging: Any change to a database object is first recorded in the log; the record in the log must be written to stable storage before the change to the database object is written to disk. Repeating History During Redo: On restart following a crash, ARIES retraces all actions of the DBMS before the crash and brings the system back to the exact state that it was in at the time of the crash.

Secondary Index Built on top of a Key-value Store

Key-value stores such as RocksDB have been used as the storage engine in several databases (MySQL/Mongo/Rockset/FoundationDB). A database typically requires support of secondary indexes. To support secondary index, a common approach is to store the secondary index as yet another key-value pairs. Let’s take a look at the following example. Employee table userid (primary key) salary age Alice 40000 30 Bob 20000 23 Charlie 30000 30 David 20000 25 To store the user table, we can store each record using tableName,primary_key -> salary,age as key/value pairs.