Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi PutCassandraQL StatementCacheSize to improve write performance

Highlighted

Nifi PutCassandraQL StatementCacheSize to improve write performance

Explorer

Hi,

I am using Nifi with the below processors to load the incoming JSON payloads coming from client application to Cassandra.

HandleHttpRequest-->>EvaluateJsonPath-->>ReplaceText-->PutCassandraQL-->HandleHttpResponse

I am running my Nifi on three node cluster and Cassandra as well is running on three nodes.

I am trying to keep the data in Cassandra in the native JSON format itself as my incoming JSON payload is of nested type. In my cassandra table, i am using UDT to maintain the array(struct) by using list<frozen<column>>

In Rreplacetext processor, I am trying to build the insert query like below:

insert into <table_name> JSON '{"COLUMNA":"${COLUMN_A}","COLUMNB": "${COLUMN_B}", "COLUMNC":"${COLUMN_C}"}';

All the incoming JSON data is getting inserted in to same table and the attributes to which we insert data is same for all the records.
but the values for these attributes will be different for each inserts.

Current challenge:

1. We are expecting around 2500 payload/minute to hit my Handlehttprequest endpoint.
2. So far , we have been able to achieve the max of write throughput of 500-550 inserts/minute. and any additional inserts to PutCassandraQL is getting queuedup and delaying the insert time to Cassandra and at times, we are getting write timeout errors
3. Understand that Cassandra, we can achieve much higher write throughput than 500-550 inserts/minute. Each insert/payload may be of around 100 KB size.

Suggestions required:
Looking through PutCassandraQL configuration property "StatementCacheSize" property can be used to improve the write throughput as long as the insert query is same (same table and same set of columns are getting loaded).
Need advice in providing inputs, what should be the value for this "StatementCacheSize" property? is this parameter represents cache size in MB/KB or number of inserts records which will be batched?

In my case where i am trying insert the data in the native JSON format, how to leverage this property. In other words, do I have do anything different in replaceText processor? to effectively use the batch inserts to Cassandra table?

What would be max write throughput that we can expect on three node Nifi cluster using PutCassandraQL processor? Any bench mark metrics available for reference?

Any help would be of great help.

Tx,
Vish