Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to enhance performance of splitjson processor.

Highlighted

how to enhance performance of splitjson processor.

New Contributor

Hi,

I'm consuming data from kafka then parsing the jsons and inserting into hbase. Since my data is nested json I have to use split json twice however I observed that split json is slowing down the performance of my entire flow causing too much data in queue. We have 2 node cluster and I have set the number of concurrent tasks for each of the processors as 5. How can I enhance it performance ? Is using jolt transformation instead of splitjson a better idea ?

2 REPLIES 2

Re: how to enhance performance of splitjson processor.

Can you provide example data and what you're splitting on? Perhaps you could use record-based processors instead of the splits (or at least after the first split)?

Re: how to enhance performance of splitjson processor.

New Contributor

Thanks for reply @Matt Burgess after reading several blogs I came to realize that indeed I have to go for record-based processors. I'm trying to use PutHBaseRecord processor but cant get the composite row key as I used in my earlier flow. I'll post a new question with correct tags and sample.