Cassandra Out Of Memory (OOM) exceptions can cause performance impact and reduce availability. The OOM exception is reported when Cassandra is unable to allocate heap space. Some times the application workload alone can cause Cassandra to quickly use all the available heap space and throw OOM exceptions. Common cause for OOM is application workload combined with background jobs such as repair or Compactions can increase heap usage and eventually lead to OOM exceptions. Some of the components and background jobs which uses high heap space are
Cassandra by default allocates 1/4th of max heap to Memtables. It can be configured cassandra.yaml to reduce the heap usage by Memtable allocations.
# memtable_heap_space_in_mb: 2048
# memtable_offheap_space_in_mb: 2048
When Cassandra is under severe heap pressure it will start flushing aggressively which will result in smaller SSTables and the creates high background compaction activity.
CQL queries that read many columns or rows per query can consume lot of heap space. Especially when the rows are wider (contains huge number of columns per row) or when rows contains tombstones. When rows are replicated to multiple nodes, the coordinator node needs to hold the data longer to meet the consistency constraints. When rows are spread across multiple SSTables, the results need to be merges from multiple SSTables and this increases heap space usage. Cassandra need to filter out the tombstones to read the live columns from a row. The number of tombstones scanned per query also affects the heap usage. Cassandra nodetool cfstats command provides some metrics on these factors. Following sample cfstat output collected from a live cluster shows the application is using wider rows with huge number of columns and there exist huge number of tombstones per row
Average live cells per slice (last five minutes): 8.566489361702128
Maximum live cells per slice (last five minutes): 1109
Average tombstones per slice (last five minutes): 1381.0478723404256
Maximum tombstones per slice (last five minutes): 24601
Cassandra nodetool cfhistograms command also provides some metrics on the SSTables. It displays statistics on row size, column count and average number of SSTables scanned per read.
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 1.00 51.01 182.79 1331 4
75% 1.00 61.21 219.34 2299 7
95% 3.00 73.46 2816.16 42510 179
98% 3.00 88.15 4055.27 182785 770
99% 3.00 105.78 4866.32 454826 1916
Min 0.00 6.87 9.89 104 0
Max 3.00 379.02 74975.55 44285675122 129557750
Concurrent Read threads
Cassandra uses by default 32 concurrent threads in read stage. These threads serve read queries. If there are more read requests are received then they go in to pending queue. Cassandra nodetool tpstats command ca be used to check the active and pending read stage tasks. If each read query consuming high amount of heap space then reducing this setting can help heap usage.
Cassandra repair creates Merkle trees to detect inconsistencies row data among multiple replicas. By default Cassandra creates Merkle tree till depth 15. Each Merkle tree contains 32768 internal nodes and consumes lot of heap space. Repair running concurrently with application load can cause high heap usage.
Background compactions can also cause high heap usage. By default compaction_throughput_mb_per_sec is set to 16 MB.
Other factors that can cause high heap usage are Cassandra Hints, Row Cache, Tombstones and space used for Index entries.
If enough system memory available, increasing the max heap size can help. Some time min heap and max heap set to same value. reducing the min heap size can help reduce the heap usage. If application is allocating bigger chunks of memory, increasing G1GC region size can help in reducing heap fragmentation and improve heap utilization. Some times reducing concurrent read threads also help in reducing heap usage which can impact read performance.