Contents

Key Value Workloads

YCSB

Workload Description Example Operation breakdown
A Update heavy workload session store recording recent actions read: 50%, update: 50%
B read mostly workload photo tagging; add a tag is an update, but most operations are to read tags read: 95%, update: 5%
C read only user profile cache, where profiles are constructed elsewhere )e.g., Hadoop) read: 100%
D read latest workload user status updates; people want to read the latest read: 95%, insert 5%
E short ranges threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id) scan: 95% (maxscanlength=100), insert: 5
F read-modify-write workload user database, where user records are read and modified by the user or to record user activity. read: 50%, readmodifywrite: 50%

RocksDB Workloads in Facebook paper link

UDB

  • Social graph data stored in MyRocks
  • Get: 75%, Put: 20%
  • 75+% of KV-pairs are Put only once.
  • Most of the start-keys of Iterators are used only once. The scan length of more than 60% of the Iterators is only 1 across all CFs.

ZippyDB

  • Distributed KV store, mapping some metadata of an object to the object address in an object storage system
  • 78% Get, 13% Put, 6% Delete, and 3% Iterator
  • For about 80% of the KV-pairs, Get requests only occur once. A very small portion of KV-pairs have very large read counts over the 24-hour period. 1% of the KV-pairs show more than 100 Get requests, and the Gets to these KV-pairs are about 50% of the total Gets that show strong localities. About 73% of the KV-pairs are Put only once, and fewer than 0.001% of the KV-pairs are Put more than 10 times. Put does not have as clear a locality as Get does.

UP2X

  • statistic counters of user activities for AI/ML prediction and interference
  • merge (read-modify-write): 92.53%, 7.46% Get
  • Merge and Get have wide distributions of access counts. Most KV pairs are Put only once.

UDB Request Distribution

../UDB-requests.png

Key Value Sizes

../kvsize.png ../kvsize-distribution.png

Memcached workloads at Twitter link

Average get ratio is close to 90% indicating most of the caches are serving read-heavy workloads.
More than 35% of all Twemcache clusters are write-heavy, and more than 20% have a write ratio higher than 50%.
Majority of the cache workloads still follow Zipfian distribution. KV sizes are small.

../twitter-kvsize.png ../twitter-operations.png

Note: Couldn’t find DeleteProportion in Core-properties for YCSB…