I'm consuming data from kafka then parsing the jsons and inserting into hbase. Since my data is nested json I have to use split json twice however I observed that split json is slowing down the performance of my entire flow causing too much data in queue. We have 2 node cluster and I have set the number of concurrent tasks for each of the processors as 5. How can I enhance it performance ? Is using jolt transformation instead of splitjson a better idea ?
Can you provide example data and what you're splitting on? Perhaps you could use record-based processors instead of the splits (or at least after the first split)?
Thanks for reply @Matt Burgess after reading several blogs I came to realize that indeed I have to go for record-based processors. I'm trying to use PutHBaseRecord processor but cant get the composite row key as I used in my earlier flow. I'll post a new question with correct tags and sample.