Support Questions
Find answers, ask questions, and share your expertise

How to inject Excel files from local file system to HDFS using flume??

Rising Star

I am just wondering has anybody come across the scenario where you need to import or read the data from excel to Hadoop? Is there such thing like Flume Excel source around?

btw, I know I can convert the excel file to csv then deal with it. Really just trying to explore flume source a bit further here.

1 ACCEPTED SOLUTION

Accepted Solutions

@Suresh Bonam

Not out of box. You can build custom.

CSV is still an option.

If your source is streaming data in real-time then Flume is a reasonable option. An alternative is Apache NiFi. Assuming the streaming in real-time and willingness for Flume, the target files to be stored to HDFS will have a similar structure (no transformation in flight). Apache NiFi could help you to perform some transformation in-flight as such the file at the target is easier to consume, e.g. Hive external tables. You could achieve something like that with Flume but with coding and pain involved.

If your Excel is static then you should use something else like a MapReduce or Spark job.

View solution in original post

1 REPLY 1

@Suresh Bonam

Not out of box. You can build custom.

CSV is still an option.

If your source is streaming data in real-time then Flume is a reasonable option. An alternative is Apache NiFi. Assuming the streaming in real-time and willingness for Flume, the target files to be stored to HDFS will have a similar structure (no transformation in flight). Apache NiFi could help you to perform some transformation in-flight as such the file at the target is easier to consume, e.g. Hive external tables. You could achieve something like that with Flume but with coding and pain involved.

If your Excel is static then you should use something else like a MapReduce or Spark job.

View solution in original post