Support Questions

Find answers, ask questions, and share your expertise

How to inject Excel files from local file system to HDFS using flume??

avatar
Expert Contributor

I am just wondering has anybody come across the scenario where you need to import or read the data from excel to Hadoop? Is there such thing like Flume Excel source around?

btw, I know I can convert the excel file to csv then deal with it. Really just trying to explore flume source a bit further here.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Suresh Bonam

Not out of box. You can build custom.

CSV is still an option.

If your source is streaming data in real-time then Flume is a reasonable option. An alternative is Apache NiFi. Assuming the streaming in real-time and willingness for Flume, the target files to be stored to HDFS will have a similar structure (no transformation in flight). Apache NiFi could help you to perform some transformation in-flight as such the file at the target is easier to consume, e.g. Hive external tables. You could achieve something like that with Flume but with coding and pain involved.

If your Excel is static then you should use something else like a MapReduce or Spark job.

View solution in original post

1 REPLY 1

avatar
Super Guru

@Suresh Bonam

Not out of box. You can build custom.

CSV is still an option.

If your source is streaming data in real-time then Flume is a reasonable option. An alternative is Apache NiFi. Assuming the streaming in real-time and willingness for Flume, the target files to be stored to HDFS will have a similar structure (no transformation in flight). Apache NiFi could help you to perform some transformation in-flight as such the file at the target is easier to consume, e.g. Hive external tables. You could achieve something like that with Flume but with coding and pain involved.

If your Excel is static then you should use something else like a MapReduce or Spark job.