Support Questions

Find answers, ask questions, and share your expertise

How to batch up FlowFiles into ORC?

avatar

In my NiFi project I get messages one at a time. I convert these into AVRO and, as one followup step, I want to bin them up into ORC files for use with Hive. To that end, I would like to use a MergeContent processor. Is that a good approach? How to find the correct configuration?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Hellmar Becker

You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:

9559-screen-shot-2016-11-17-at-15021-pm.png

Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.

Thanks,

Matt

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Hellmar Becker

What format are your original messages in before you convert them to Avro?

avatar

Original format is CSV.

avatar
Master Mentor

@Hellmar Becker

You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:

9559-screen-shot-2016-11-17-at-15021-pm.png

Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.

Thanks,

Matt

avatar

Thanks, this works great! Still I wonder: can I force NiFi to create larger batches? I noticed that whenever the batch size exceeds the queue size, things get stuck.