key to token with random partitioner

Cassandra supports various strategies to partition data across nodes in a cluster. User can choose available strategies as required by an application. Most commonly used strategy is to evenly distribute keys across the nodes using consistency hashing scheme.

In Cassandra, data is stored in rows in column families on nodes. Each row is identifiedby a unique key. The nodes in a cluster are formed as a ring where each node is assigned with a fixed position on the ring represented by a token and is responsible to store a range of keys.Each row key is also mapped to a token and assigned to nodes based on the token position in the token space.

The default Partitioner is RandomPartitioner (org.apache.cassandra.dht.RandomPartitioner) which uses MD5 hashing algorithm to map keys to tokens. It uses unsigned big integer of length 128 bits to represent a token and the possible token space consists of tokens from 0 to 2^127-1.

A key is converted to token by calculating MD5 digest of the contents of the key and the resulted digest is converted to hexadecimal format and represented in 2's complement form. Here is the python code to map a key to token


gettoken.py:

import hashlib

key = "foo"

# Calculate MD5 digest and convert it to hex format
digest = hashlib.md5(key).hexdigest()

# Convert the hash digest to 2's complement form
token  = long(digest, 16)
bits   = 128
if ((token & (1 << (bits - 1))) != 0):
    token = token - (1 << bits)

# Convert the resulting number to unsigned form
print abs(token)

$ python token.py
110673303387115207421586718101067225896

You can checkout the code at github ctools

No comments:

Post a Comment