Support Questions

Find answers, ask questions, and share your expertise

how to enhance performance of splitjson processor.


I'm consuming data from kafka then parsing the jsons and inserting into hbase. Since my data is nested json I have to use split json twice however I observed that split json is slowing down the performance of my entire flow causing too much data in queue. We have 2 node cluster and I have set the number of concurrent tasks for each of the processors as 5. How can I enhance it performance ? Is using jolt transformation instead of splitjson a better idea ?


Super Guru

Can you provide example data and what you're splitting on? Perhaps you could use record-based processors instead of the splits (or at least after the first split)?

Thanks for reply @Matt Burgess after reading several blogs I came to realize that indeed I have to go for record-based processors. I'm trying to use PutHBaseRecord processor but cant get the composite row key as I used in my earlier flow. I'll post a new question with correct tags and sample.