Cassandra Data partitioning with random partitioner

Cassandra is a distributed hash table (DHT) based storage system where data is dynamically partitioned to a set of storage nodes in a cluster using a data partitioning scheme. It provides two data partition schemes called RandomPartitioner and ByteOrderedPartitioner. This blog article provides an overview of RandomPartitioner which is the commonly used partitioning scheme.

In RandomPartitioner strategy, keys are uniformly distributed across the nodes using MD5 hash algorithm. MD5 hash provides a natural way of load balancing keys to nodes. Each data item is mapped to a token by calculating MD5 hash of its key. A token is a big integer of 16 bytes length and it is represented using 2's complement notation. The hash function output range is from 0 to 2^127 – 1.  It uses consistency hashing scheme to minimize the data movement when nodes are added or removed. In consistency hashing scheme, the token space is treated as a circular space (ring) where the largest hash value (token) is followed by the smallest hash value.

Each storage node is randomly assigned with a token which also represents its position on the ring. A data item is stored on a node based on its position on the ring. The first node with the token that is next in the clock wise order from item's token. This node is also called its primary replica or coordinator for this item.

Each node stores the keys whose tokens are between its own token (including) and the predecessor node token (excluding). This region on the ring is also called token range. When a node leaves or joins, only the immediate neighbor nodes are affected.

In the below example, a hypothetical Cassandra cluster is shown with token space from token 0 to token 1023. It consists of four nodes N1,N2,N3 and N4.

An example data item which is mapped to token 125 is stored on primary node N2 and replicated on N3 and N4 using SimpleStrategy replication with replication factor (RF) 3.

No comments:

Post a Comment