Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to enhance performance of splitjson processor.

how to enhance performance of splitjson processor.

Hi,

I'm consuming data from kafka then parsing the jsons and inserting into hbase. Since my data is nested json I have to use split json twice however I observed that split json is slowing down the performance of my entire flow causing too much data in queue. We have 2 node cluster and I have set the number of concurrent tasks for each of the processors as 5. How can I enhance it performance ? Is using jolt transformation instead of splitjson a better idea ?

2 REPLIES 2
Highlighted

Re: how to enhance performance of splitjson processor.

Super Guru

Can you provide example data and what you're splitting on? Perhaps you could use record-based processors instead of the splits (or at least after the first split)?

Highlighted

Re: how to enhance performance of splitjson processor.

Thanks for reply @Matt Burgess after reading several blogs I came to realize that indeed I have to go for record-based processors. I'm trying to use PutHBaseRecord processor but cant get the composite row key as I used in my earlier flow. I'll post a new question with correct tags and sample.

Don't have an account?
Coming from Hortonworks? Activate your account here