Support Questions

Find answers, ask questions, and share your expertise

what to do for performance tuning of Cassandra database?

avatar
Expert Contributor

@hardikvdesai

1 ACCEPTED SOLUTION

avatar
New Contributor

Give commit log an SSD

The simplest thing that you can which will yield a high performance boost is to give your commit log a dedicated SSD. Since cassandra utilises the commit log heavily, switching the commitlog_directory setting in cassandra.yaml to a dedicated SSd away from where you store sstables (the data files) will give much better write performances.

Heap space

Cassandra has a script that automatically allocates memory to each node, the script is very good in most usecases, but if you have lots of other tech running on the same machine which is likely in HDP, you probably want to check how much memory is assgined to your cassandra node. For cassandra 2.2.x the recomendation is between 2-8GB, for Cassandra 3+ you can extend the heap to 16GB and boost performance. This brings up another interesting point, heap overallocation. Remember that cassandra depends on GC for clearing up unused memtables and other datastructures, allocating too much memory will cause GC to slow down.

Enable JNA

Ensure that you have the JNA (Java Native Access) library enabled in your cluster. It allows java to use native C methods and gives it access to native memory which is utilised for offheap storage for many of the datastructures inside of cassandra. Check logs for the following two, the latter meaning JNA was able to get access to native memory: JNA link failure, one or more native method will be unavailable. CLibrary.java (line 121) JNA mlockall successful

Memtable = offheap

Configure memtables to be stored in native memory rather than the JVM's heap, in cassandra.yaml: memtable_allocation_type: offheap_objects

Compaction

Use the correct Compaction Strategy for your workload! Leveled compaction can really help READ heavy workloads since it guarantees that in 90% of reads you'll be able to retreive the row you want from an individual sstable once it has been compacted to levels higher than 0. Size-tiered compaction can heal deal with WRITE-burst type workloads where you expect there to be very high pressure peaks of writes.

Swap

Make sure you've disabled Swap, we dont wont cassandra going into swap space, performance will degrade very rapidly (and set /proc/sys/vm/swappiness to 1 just incase it gets re-enabled by accident).

There are whole books written about this, but these are some of the pointers off the top of my head.

View solution in original post

3 REPLIES 3

avatar
New Contributor

Give commit log an SSD

The simplest thing that you can which will yield a high performance boost is to give your commit log a dedicated SSD. Since cassandra utilises the commit log heavily, switching the commitlog_directory setting in cassandra.yaml to a dedicated SSd away from where you store sstables (the data files) will give much better write performances.

Heap space

Cassandra has a script that automatically allocates memory to each node, the script is very good in most usecases, but if you have lots of other tech running on the same machine which is likely in HDP, you probably want to check how much memory is assgined to your cassandra node. For cassandra 2.2.x the recomendation is between 2-8GB, for Cassandra 3+ you can extend the heap to 16GB and boost performance. This brings up another interesting point, heap overallocation. Remember that cassandra depends on GC for clearing up unused memtables and other datastructures, allocating too much memory will cause GC to slow down.

Enable JNA

Ensure that you have the JNA (Java Native Access) library enabled in your cluster. It allows java to use native C methods and gives it access to native memory which is utilised for offheap storage for many of the datastructures inside of cassandra. Check logs for the following two, the latter meaning JNA was able to get access to native memory: JNA link failure, one or more native method will be unavailable. CLibrary.java (line 121) JNA mlockall successful

Memtable = offheap

Configure memtables to be stored in native memory rather than the JVM's heap, in cassandra.yaml: memtable_allocation_type: offheap_objects

Compaction

Use the correct Compaction Strategy for your workload! Leveled compaction can really help READ heavy workloads since it guarantees that in 90% of reads you'll be able to retreive the row you want from an individual sstable once it has been compacted to levels higher than 0. Size-tiered compaction can heal deal with WRITE-burst type workloads where you expect there to be very high pressure peaks of writes.

Swap

Make sure you've disabled Swap, we dont wont cassandra going into swap space, performance will degrade very rapidly (and set /proc/sys/vm/swappiness to 1 just incase it gets re-enabled by accident).

There are whole books written about this, but these are some of the pointers off the top of my head.

avatar
New Contributor

While Cassandra is based on the NoSQL family of databases, there's an explanation why we need to use a NoSQL database by Eileen McNulty on Dataconomy.

The four main challenges with Apache Cassandra and how to deal with them

avatar
New Contributor