Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do I best set MergeContent properties to control file size when tailing logs

avatar
Guru

I am tailing a log file into MergeContent. I want MergeContent to merge log entries into a large flow file to put to HDFS. I have been fiddling with the properties:

  • Merge Strategy
  • Minimum Number of Entries
  • Maximum Number of Entries
  • Maximum number of Bins

It has been pretty much trial-and-error.

How do the above properties determine MergeContent output FlowFile size, and what is the most direct way to, say, double the output file size compared to existing settings? What is the most direct way to increase size until a desired size is reached?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@gkeys

The mergecontent processor has 2 properties that I normally use to determine the output file size

  • Minimum number of entries
  • Minimum group size

For your question as how do i increase the file size to reach a desired file size (say 1gb)? - Set the minimum group size to the size that you would like (i.e 1 gb) AND set the minimum number of entries to 1. This will merge the content to the 1 gb before it writes out to the next processor

Can you clarify a little more about your other question on how do i double the size on existing setting?? - do you mean double the size of incoming file? - this will be direct. Just set the minimum number of entries to 2 and minimum group size to 0 b

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

You should be able to use the property "Maximum Group Size" to control how big the concatenated data will get before it's transferred to the "merged" relationship. If you want to strictly use size of the concatenated data to dictate completion of merging, you should set "Max Number of Entries" to 0.

avatar
Super Collaborator

@gkeys

The mergecontent processor has 2 properties that I normally use to determine the output file size

  • Minimum number of entries
  • Minimum group size

For your question as how do i increase the file size to reach a desired file size (say 1gb)? - Set the minimum group size to the size that you would like (i.e 1 gb) AND set the minimum number of entries to 1. This will merge the content to the 1 gb before it writes out to the next processor

Can you clarify a little more about your other question on how do i double the size on existing setting?? - do you mean double the size of incoming file? - this will be direct. Just set the minimum number of entries to 2 and minimum group size to 0 b

avatar
Guru

(@hduraiswamy please ignore second question ... just a rephrasing of the first)

avatar
Guru

One minor thing to remember about the answer is that maximum number of entries must be left blank