Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Reading/Writing data from HDFS via Windows.

avatar
Contributor

Does anyone else have this requirement?

 

We have a set of automated processes that ingest data from all over the place (on windows) and need a way to reliably load that data in to HDFS

 

What is the best way to do this? NFS gateway (lack of ACL is a concern here)? HTTPFS?

 

Cluster is kerberized and CentOS based, if that helps.

 

Any insights/tips would be very helpful.

 

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

There a few solutions out there for windows hadoop client, I haven't tried any of them so, will defer to the community at large for specifics. One elegant/interesting approach that I will point out is the possibility of a flume agent for windows. 

 

Tried and true method:
Off the top of my head, you can use file transfer method of your choice to get the files from the windows machine to a linux machine (SFTP, Samba, etc), and then use your favorite HDFS loading command/process to get the files into HDFS (hdfs dfs -copyFromLocal, flume, etc.)

 

View solution in original post

1 REPLY 1

avatar
Master Collaborator

There a few solutions out there for windows hadoop client, I haven't tried any of them so, will defer to the community at large for specifics. One elegant/interesting approach that I will point out is the possibility of a flume agent for windows. 

 

Tried and true method:
Off the top of my head, you can use file transfer method of your choice to get the files from the windows machine to a linux machine (SFTP, Samba, etc), and then use your favorite HDFS loading command/process to get the files into HDFS (hdfs dfs -copyFromLocal, flume, etc.)