Created 11-17-2016 06:19 PM
In my NiFi project I get messages one at a time. I convert these into AVRO and, as one followup step, I want to bin them up into ORC files for use with Hive. To that end, I would like to use a MergeContent processor. Is that a good approach? How to find the correct configuration?
Created on 11-17-2016 06:53 PM - edited 08-19-2019 04:35 AM
You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:
Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.
Thanks,
Matt
Created 11-17-2016 06:30 PM
What format are your original messages in before you convert them to Avro?
Created 11-17-2016 06:31 PM
Original format is CSV.
Created on 11-17-2016 06:53 PM - edited 08-19-2019 04:35 AM
You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:
Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.
Thanks,
Matt
Created 11-18-2016 06:55 PM
Thanks, this works great! Still I wonder: can I force NiFi to create larger batches? I noticed that whenever the batch size exceeds the queue size, things get stuck.