Support Questions
Find answers, ask questions, and share your expertise

Nifi merge content by attribute (kafka to hdfs)

Nifi merge content by attribute (kafka to hdfs)

New Contributor

im trying to get messages from kafka (consume kafka 2) and save to hdfs by date 

the date is in the kafka json path , then save the files by date , i managed to consume from kafka , evaljsonpath  , update attribute (the value is in epoch) , then save to the date path by updating the epoch to yyy/mm/dd/HH so each file is saved to the right hour

every thing was ok until  i use the merge content in order to group 10000 messages to one file 

since the dates are different i get a single folder for one hour and the rest of the files are saved to the root filesystem

2 REPLIES 2

Re: Nifi merge content by attribute (kafka to hdfs)

Rising Star

Use the date filed extracted in attribute while merging so that merging will happen for like (belongs to same date-time-hour) files.
In MergeContent processor set 'Correlation Attribute Name' field with above populated date attribute name, rest will be taken care.
You may need to increase number of buckets in merge processor based on your use case

Re: Nifi merge content by attribute (kafka to hdfs)

New Contributor

thanks @hegdemahendra ,

that is what i did , i set the number of bucket to 32 , but the merge processor is unable to process all the hours at once

i get /root/yyyy/mm/dd/HH as the path but only part of the merged files are created there , most are crated in the root dir /root/  

i had to run another flow with get files ,split , and run the same flow until all the files are processed