Cassandra Heap Issues

 Cassandra Out Of Memory (OOM) exceptions can cause performance impact and reduce availability.  The OOM exception is reported when Cassandra is unable to allocate heap space. Some times the application workload alone can cause Cassandra to quickly use all the available heap space and throw OOM exceptions. Common cause for OOM is application workload combined with background jobs such as repair or Compactions can increase heap usage and eventually lead to OOM exceptions. Some of the components and background jobs which uses high heap space are

Memtables

Cassandra by default allocates 1/4th of max heap to Memtables. It can be configured cassandra.yaml to reduce the heap usage by Memtable allocations.   

# memtable_heap_space_in_mb: 2048
# memtable_offheap_space_in_mb: 2048

When Cassandra is under severe heap pressure it will start flushing aggressively which will result in smaller SSTables and the creates high background compaction activity.

Reads

CQL queries that read many columns or rows per query can consume lot of heap space. Especially when the rows are wider (contains huge number of columns per row) or when rows contains tombstones. When rows are replicated to multiple nodes, the coordinator node needs to hold the data longer to meet the consistency constraints. When rows are spread across multiple SSTables, the results need to be merges from multiple SSTables and this increases heap space usage. Cassandra need to filter out the tombstones to read the live columns from a row. The number of tombstones scanned per query also affects the heap usage. Cassandra nodetool cfstats command provides some metrics on these factors. Following sample cfstat output collected from a live cluster shows the application is using wider rows with huge number of columns and there exist huge number of tombstones per row

                Average live cells per slice (last five minutes): 8.566489361702128
                Maximum live cells per slice (last five minutes): 1109
                Average tombstones per slice (last five minutes): 1381.0478723404256
                Maximum tombstones per slice (last five minutes): 24601

Cassandra nodetool cfhistograms command also provides some metrics on the SSTables. It displays statistics on row size, column count and average number of SSTables scanned per read.

Percentile  SSTables     Write Latency    Read Latency    Partition Size        Cell Count
                                      (micros)             (micros)             (bytes)
50%             1.00           51.01                 182.79               1331                      4
75%             1.00           61.21                 219.34               2299                      7
95%             3.00           73.46                 2816.16             42510                    179
98%             3.00           88.15                 4055.27             182785                  770
99%             3.00           105.78               4866.32             454826                  1916
Min              0.00           6.87                   9.89                  104                         0
Max             3.00           379.02               74975.55          44285675122         129557750

Concurrent Read threads

Cassandra uses by default 32 concurrent threads in read stage. These threads serve read queries. If there are more read requests are received then they go in to pending queue. Cassandra nodetool tpstats command ca be used to check the active and pending read stage tasks. If each read query consuming high amount of heap space then reducing this setting can help heap usage.

Repair 

Cassandra repair creates Merkle trees to detect inconsistencies row data among multiple replicas. By default Cassandra creates Merkle tree till depth 15. Each Merkle tree contains 32768 internal nodes and consumes lot of heap space. Repair running concurrently with application load can cause high heap usage.

Compaction

Background compactions can also cause high heap usage. By default compaction_throughput_mb_per_sec is set to 16 MB.

Other factors

Other factors that can cause high heap usage are Cassandra Hints,  Row Cache, Tombstones and  space used for Index entries. 

Workarounds

If enough system memory available, increasing the max heap size can help. Some time min heap and max heap set to same value. reducing the min heap size can help reduce the heap usage. If application is allocating bigger chunks of memory, increasing G1GC region size can help in reducing heap fragmentation and improve heap utilization. Some times reducing concurrent read threads also help in reducing heap usage which can impact read performance.