- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to inject Excel files from local file system to HDFS using flume??
- Labels:
-
Apache Flume
Created 11-08-2016 06:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am just wondering has anybody come across the scenario where you need to import or read the data from excel to Hadoop? Is there such thing like Flume Excel source around?
btw, I know I can convert the excel file to csv then deal with it. Really just trying to explore flume source a bit further here.
Created 12-20-2016 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not out of box. You can build custom.
CSV is still an option.
If your source is streaming data in real-time then Flume is a reasonable option. An alternative is Apache NiFi. Assuming the streaming in real-time and willingness for Flume, the target files to be stored to HDFS will have a similar structure (no transformation in flight). Apache NiFi could help you to perform some transformation in-flight as such the file at the target is easier to consume, e.g. Hive external tables. You could achieve something like that with Flume but with coding and pain involved.
If your Excel is static then you should use something else like a MapReduce or Spark job.
Created 12-20-2016 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not out of box. You can build custom.
CSV is still an option.
If your source is streaming data in real-time then Flume is a reasonable option. An alternative is Apache NiFi. Assuming the streaming in real-time and willingness for Flume, the target files to be stored to HDFS will have a similar structure (no transformation in flight). Apache NiFi could help you to perform some transformation in-flight as such the file at the target is easier to consume, e.g. Hive external tables. You could achieve something like that with Flume but with coding and pain involved.
If your Excel is static then you should use something else like a MapReduce or Spark job.
