Support Questions

hardikv_desai · ‎05-06-2016

@hardikvdesai

lyuben_todorov · ‎05-18-2016

Give commit log an SSD

The simplest thing that you can which will yield a high performance boost is to give your commit log a dedicated SSD. Since cassandra utilises the commit log heavily, switching the commitlog_directory setting in cassandra.yaml to a dedicated SSd away from where you store sstables (the data files) will give much better write performances.

Heap space

Cassandra has a script that automatically allocates memory to each node, the script is very good in most usecases, but if you have lots of other tech running on the same machine which is likely in HDP, you probably want to check how much memory is assgined to your cassandra node. For cassandra 2.2.x the recomendation is between 2-8GB, for Cassandra 3+ you can extend the heap to 16GB and boost performance. This brings up another interesting point, heap overallocation. Remember that cassandra depends on GC for clearing up unused memtables and other datastructures, allocating too much memory will cause GC to slow down.

Enable JNA

Ensure that you have the JNA (Java Native Access) library enabled in your cluster. It allows java to use native C methods and gives it access to native memory which is utilised for offheap storage for many of the datastructures inside of cassandra. Check logs for the following two, the latter meaning JNA was able to get access to native memory: JNA link failure, one or more native method will be unavailable. CLibrary.java (line 121) JNA mlockall successful

Memtable = offheap

Configure memtables to be stored in native memory rather than the JVM's heap, in cassandra.yaml: memtable_allocation_type: offheap_objects

Compaction

Use the correct Compaction Strategy for your workload! Leveled compaction can really help READ heavy workloads since it guarantees that in 90% of reads you'll be able to retreive the row you want from an individual sstable once it has been compacted to levels higher than 0. Size-tiered compaction can heal deal with WRITE-burst type workloads where you expect there to be very high pressure peaks of writes.

Swap

Make sure you've disabled Swap, we dont wont cassandra going into swap space, performance will degrade very rapidly (and set /proc/sys/vm/swappiness to 1 just incase it gets re-enabled by accident).

There are whole books written about this, but these are some of the pointers off the top of my head.

View solution in original post

lyuben_todorov · ‎05-18-2016

Give commit log an SSD

The simplest thing that you can which will yield a high performance boost is to give your commit log a dedicated SSD. Since cassandra utilises the commit log heavily, switching the commitlog_directory setting in cassandra.yaml to a dedicated SSd away from where you store sstables (the data files) will give much better write performances.

Heap space

Cassandra has a script that automatically allocates memory to each node, the script is very good in most usecases, but if you have lots of other tech running on the same machine which is likely in HDP, you probably want to check how much memory is assgined to your cassandra node. For cassandra 2.2.x the recomendation is between 2-8GB, for Cassandra 3+ you can extend the heap to 16GB and boost performance. This brings up another interesting point, heap overallocation. Remember that cassandra depends on GC for clearing up unused memtables and other datastructures, allocating too much memory will cause GC to slow down.

Enable JNA

Ensure that you have the JNA (Java Native Access) library enabled in your cluster. It allows java to use native C methods and gives it access to native memory which is utilised for offheap storage for many of the datastructures inside of cassandra. Check logs for the following two, the latter meaning JNA was able to get access to native memory: JNA link failure, one or more native method will be unavailable. CLibrary.java (line 121) JNA mlockall successful

Memtable = offheap

Configure memtables to be stored in native memory rather than the JVM's heap, in cassandra.yaml: memtable_allocation_type: offheap_objects

Compaction

Use the correct Compaction Strategy for your workload! Leveled compaction can really help READ heavy workloads since it guarantees that in 90% of reads you'll be able to retreive the row you want from an individual sstable once it has been compacted to levels higher than 0. Size-tiered compaction can heal deal with WRITE-burst type workloads where you expect there to be very high pressure peaks of writes.

Swap

Make sure you've disabled Swap, we dont wont cassandra going into swap space, performance will degrade very rapidly (and set /proc/sys/vm/swappiness to 1 just incase it gets re-enabled by accident).

There are whole books written about this, but these are some of the pointers off the top of my head.

tamuccuser · ‎07-04-2019

While Cassandra is based on the NoSQL family of databases, there's an explanation why we need to use a NoSQL database by Eileen McNulty on Dataconomy.

The four main challenges with Apache Cassandra and how to deal with them

tamuccuser · ‎07-08-2019

One more resource:

Should you use NoSQL or SQL Db or both by The Startup manager on medium

Cloudera Community

Support Questions

what to do for performance tuning of Cassandra database?

SQOOP Performance tuning

Tuning Hbase for optimized performance ( Part 1 )

Hive on Tez Performance Tuning - Determining Reduc...

Tuning Hbase for optimized performance ( Part 3 )

Tuning Hbase for optimized performance ( Part 2 )

Ambari Server Performance Tuning & Troubleshooting...

Tuning Hbase for optimized performance ( Part 4 )

Tuning Hbase for optimized performance ( Part 5 ) ...

Tips and best practices for optimizing Hive perfor...

Hive Performance Tuning Parameters