Key Value Workloads
Contents
YCSB
Workload | Description | Example | Operation breakdown |
---|---|---|---|
A | Update heavy workload | session store recording recent actions | read: 50%, update: 50% |
B | read mostly workload | photo tagging; add a tag is an update, but most operations are to read tags | read: 95%, update: 5% |
C | read only | user profile cache, where profiles are constructed elsewhere )e.g., Hadoop) | read: 100% |
D | read latest workload | user status updates; people want to read the latest | read: 95%, insert 5% |
E | short ranges | threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id) | scan: 95% (maxscanlength=100), insert: 5 |
F | read-modify-write workload | user database, where user records are read and modified by the user or to record user activity. | read: 50%, readmodifywrite: 50% |
RocksDB Workloads in Facebook paper link
UDB
- Social graph data stored in MyRocks
- Get: 75%, Put: 20%
- 75+% of KV-pairs are Put only once.
- Most of the start-keys of Iterators are used only once. The scan length of more than 60% of the Iterators is only 1 across all CFs.
ZippyDB
- Distributed KV store, mapping some metadata of an object to the object address in an object storage system
- 78% Get, 13% Put, 6% Delete, and 3% Iterator
- For about 80% of the KV-pairs, Get requests only occur once. A very small portion of KV-pairs have very large read counts over the 24-hour period. 1% of the KV-pairs show more than 100 Get requests, and the Gets to these KV-pairs are about 50% of the total Gets that show strong localities. About 73% of the KV-pairs are Put only once, and fewer than 0.001% of the KV-pairs are Put more than 10 times. Put does not have as clear a locality as Get does.
UP2X
- statistic counters of user activities for AI/ML prediction and interference
- merge (read-modify-write): 92.53%, 7.46% Get
- Merge and Get have wide distributions of access counts. Most KV pairs are Put only once.
UDB Request Distribution
Key Value Sizes
Memcached workloads at Twitter link
Average get ratio is close to 90% indicating most of the caches are serving read-heavy workloads.
More than 35% of all Twemcache clusters are write-heavy, and more than 20% have a write ratio higher than 50%.
Majority of the cache workloads still follow Zipfian distribution. KV sizes are small.
Note: Couldn’t find DeleteProportion in Core-properties for YCSB…