HDFS Namenode RPC Request Execution Time Breakdown

Xing Lin published on 2023-11-29

1 2 3 ----- enqueueTime queueingTime Processing/ResponseTime/HandlerTime |50090| <- Listener --> pendingConnections <- Reader1 -----------> CallQueue <----- handler (processes and sends response) ----- \-> pendingConnections <- Reader2 A main listener thread is accepting new connections from clients and put connections into pendingConnections queue of a Reader thread. A Reader thread detects any ready connection, reads the request and puts the call into CallQueue. This put() operation is blocking and is accounted as enqueueTime.

Java ExecutorService

Xing Lin published on 2023-07-03

Recently, I was working on HDFS-17030 and used ExecutorService for multi-threading execution. We hit a non-intuitive `bug` of how JVM/garbage collector works. The issue is as following. We created an executorService with 1 core thread in a class. We did not set allowCoreThreadTimeOut to true (default is false). So, that core thread will be kept running, even when the main thread exits! Then, the JVM process won’t exit, because there is still one thread running!

Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores

Xing Lin published on 2022-08-12

Read/write performance for cloud object store. Each read operation usually incurs at least 5–10 ms of base latency, and can then read data at roughly 50–100 MB/s, so an operation needs to read at least several hundred kilobytes to achieve at least half the peak throughput for sequential reads, and multiple megabytes to approach the peak throughput. The VM types most frequently used for analytics on AWS have at least 10 Gbps network bandwidth, so they need to run 8–10 reads in parallel to fully utilize this bandwidth.

The CacheLib Caching Engine: Design and Experiences at Scale

Xing Lin published on 2022-07-29

CacheLib is a general-purpose caching engine, designed based on experiences with a range of caching use cases at Facebook, that facilitates the easy development and maintenance of caches. CacheLib was first deployed at Facebook in 2017 and today powers over 70 services including CDN, storage, and application-data caches. All of these systems process millions of queries per second, cache working sets large enough to require using both flash and DRAM for caching, and must tolerate frequent restarts due to application updates, which are common in the Facebook production environment.

Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service

Xing Lin published on 2022-07-16

Hundreds of thousands of customers rely on DynamoDB for its fundamental properties In 2021, during the 66-hour Amazon Prime Day shopping event, Amazon systems - including Alexa, the Amazon.com sites, and Amazon fulfillment centers, made trillions of API calls to DynamoDB, peaking at 89.2 million requests per second, while experiencing high availability with single-digit millisecond performance. For DynamoDB customers, consistent performance at any scale is often more important than median request service times because unexpectedly high latency requests can amplify through higher layers of applications that depend on DynamoDB and lead to a bad customer experience.

Book Review: High Output Management

Xing Lin published on 2021-06-13

I finished reading Andrew Grove’s book on ``High Output Management’’ and really liked it, especially when reading the second half. It explains a few concepts, such as how a person’s needs may change and what is controlling a person’s behavior. It also contains quite a few practical advices on how to handle issues that could happen in a workplace. I highly recommend everyone to take a look at this book.