Support Questions

Find answers, ask questions, and share your expertise

How to batch up FlowFiles into ORC?

avatar
Not applicable

In my NiFi project I get messages one at a time. I convert these into AVRO and, as one followup step, I want to bin them up into ORC files for use with Hive. To that end, I would like to use a MergeContent processor. Is that a good approach? How to find the correct configuration?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Hellmar Becker

You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:

9559-screen-shot-2016-11-17-at-15021-pm.png

Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.

Thanks,

Matt

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Hellmar Becker

What format are your original messages in before you convert them to Avro?

avatar
Not applicable

Original format is CSV.

avatar
Master Mentor

@Hellmar Becker

You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:

9559-screen-shot-2016-11-17-at-15021-pm.png

Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.

Thanks,

Matt

avatar
Not applicable

Thanks, this works great! Still I wonder: can I force NiFi to create larger batches? I noticed that whenever the batch size exceeds the queue size, things get stuck.