- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to batch up FlowFiles into ORC?
Created ‎11-17-2016 06:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my NiFi project I get messages one at a time. I convert these into AVRO and, as one followup step, I want to bin them up into ORC files for use with Hive. To that end, I would like to use a MergeContent processor. Is that a good approach? How to find the correct configuration?
Created on ‎11-17-2016 06:53 PM - edited ‎08-19-2019 04:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:
Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.
Thanks,
Matt
Created ‎11-17-2016 06:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What format are your original messages in before you convert them to Avro?
Created ‎11-17-2016 06:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Original format is CSV.
Created on ‎11-17-2016 06:53 PM - edited ‎08-19-2019 04:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may want to try merging your CSV files before sending them to the ConvertCSVtoAvro processor. Try configuring your MergeContent as follows:
Feed merged output from this processor to the ConvertCSVtoAvro, then that output to ConvertAvroToORC processor, and finally off to hive.
Thanks,
Matt
Created ‎11-18-2016 06:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, this works great! Still I wonder: can I force NiFi to create larger batches? I noticed that whenever the batch size exceeds the queue size, things get stuck.
