Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to create only one file by MergeContent processor on the cluster

Solved Go to solution
Highlighted

How to create only one file by MergeContent processor on the cluster

Explorer

I need to extract data from a relational database and load it into S3 bucket. I have a 5 node cluster, and use "GenerateTableFetch" (Primary node) --> "ExecuteSQL" (All nodes) combination to read the data in parallel. I also need to merge extracted data into a single file before loading it into S3, but my "MergeContent" processor produces multiple files in S3. Is there a way to get this done? The full flow looks like this:

"GenerateTableFetch" --> "ExecuteSQL" --> "MergeConent" --> "ConvertAvroToJSON" --> "UpdateAttribute" --> "CompressContent" --> "PutS3Object"

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to create only one file by MergeContent processor on the cluster

Explorer

@Shu

Thank you - after some tweaking and tuning of the parameters you mentioned I was able to achieve desired results.

Alex

View solution in original post

5 REPLIES 5
Highlighted

Re: How to create only one file by MergeContent processor on the cluster

Super Guru
@Alex M

Can you please share more details of your configurations of MergeContent Processor.

Refer to below community links How to configure Merge Content processor.

https://community.hortonworks.com/questions/148294/nifi-problems-with-emply-queue.html?childToView=1...

https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.ht...

Highlighted

Re: How to create only one file by MergeContent processor on the cluster

Super Guru

@Alex M,

You need to change Minimum Group Size as per your requirement like (1 B,1 KB,1 MB,1 GB..)

Example:-

As you can see below configs i changed Minimum Group Size as 10 MB //The minimum size of for the bundle.

let's consider your each flow files size is 1 MB each so the processor will wait until the group size reaches to 10 MB and then bundles all the flowfiles as 1(i.e 10 flowfiles merged as 1 flowfile after merge content processor).

if the flowfiles won't meet the minimum group size requirement then the flowfiles are going to wait before merge content processor until it reaches the minimum group size.

46438-mergecontent.png

How to force merge flowfiles?

By specifying Max Bin Age property

No matter how many Flowfiles have been assigned to a given bin, that bin will be merged once the bin has existed for this amount of time.

let's consider if i set Max Bin Age property to 10 min and i had only 5 flowfiles having 5 MB over all queue size before merge content processor and our minimum group size property is 10 MB.

The queue will never meet the minimum group size requirement that means flowfiles will be queued for ever there to over come this situation we have added 10 min as max bin age so once the flowfile been in the queue for 10 min then the processor going to merge the flowfiles although they haven't meet the minimum group size requirement also.

About all the other properties in Merge Content processor please refer to the links that i mentioned above answer.

Let me know if you are having any questions..!!

Highlighted

Re: How to create only one file by MergeContent processor on the cluster

Explorer

Attached is the template.

Thanks,

Alex

Re: How to create only one file by MergeContent processor on the cluster

Super Guru

1. Change the amount and delay of the merge.

2. You can add an Enforce Order processor (only one primary node)

3. Make all connections FirstInFirstOutPrioritizer

Highlighted

Re: How to create only one file by MergeContent processor on the cluster

Explorer

@Shu

Thank you - after some tweaking and tuning of the parameters you mentioned I was able to achieve desired results.

Alex

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here