Cassandra Data Replication

In a distributed system like Cassandra, data replication enables high availability and durability. Cassandra replicates rows in a column family on to multiple endpoints based on the replication strategy associated to its keyspace. The endpoints which store a row are called replicas or natural endpoints for that row. Number of replicas and their location are determined by replication factor and replication strategy. Replication strategy controls how the replicas are chosen and replication factor determines the number of replicas for a key. Replication strategy is defined when creating a keyspace and replication factor is configured differently based on the chosen replication strategy.

Two kinds of replication strategies available in Cassandra. First one is SimpeStrategy which is rack unaware and data center unaware policy. It is commonly used when nodes are in a single data center. The second strategy is NetworkTopologyStategy which is both rack aware and data center aware. Even if you have single data center but nodes are in different racks it is better to use NetworkTopologyStategy which is rack aware and enables fault tolerance by choosing replicas from different racks. Also if you are planning to expand the cluster in the future to have more than one data center then choosing NetworkTopologyStategy from the beginning avoids data migration.

Data partitioner determines coordinator node for each key. The coordinator node is the fist replica for a key which is also called primary replica. If replication factor is N, a key's coordinator node replicates it to other N-1 replicas.

In SimpleStrategy, successor nodes or the nodes on the ring immediate following in clockwise direction to the coordinator node are selected as  replicas.

In NetworkTopologyStategy, nodes from distinct available racks in each data center are chosen as replicas. User need to specify per data center replication factor in multiple data center environment. In each data center, successor nodes to the coordinator node which are from distinct racks are chosen.

Cassandra nodetool getendpoints command can be used to retrieve the replicas of a key.

1 comment:

  1. Baffling with Data Replication Issue in Cassandra? Contact to Cassandra Customer Service
    On the off chance that you need to supervise you information replication issue in Cassandra, our submitted pros that run our controlled affiliations gives you 24*7 Cassandra Database Support or Apache Cassandra Support. From starting to satisfaction, which means to plot we give uncommon procedure which you never found in some other help collusion. As necessities seem to be, interface with us by systems for dialing our toll number and get most proper help through our gifted specialists.
    For More Info:
    Contact Number: 1-800-450-8670
    Email Address-
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801