Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Reading/Writing data from HDFS via Windows.

avatar
Contributor

Does anyone else have this requirement?

 

We have a set of automated processes that ingest data from all over the place (on windows) and need a way to reliably load that data in to HDFS

 

What is the best way to do this? NFS gateway (lack of ACL is a concern here)? HTTPFS?

 

Cluster is kerberized and CentOS based, if that helps.

 

Any insights/tips would be very helpful.

 

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

There a few solutions out there for windows hadoop client, I haven't tried any of them so, will defer to the community at large for specifics. One elegant/interesting approach that I will point out is the possibility of a flume agent for windows. 

 

Tried and true method:
Off the top of my head, you can use file transfer method of your choice to get the files from the windows machine to a linux machine (SFTP, Samba, etc), and then use your favorite HDFS loading command/process to get the files into HDFS (hdfs dfs -copyFromLocal, flume, etc.)

 

View solution in original post

1 REPLY 1

avatar
Master Collaborator

There a few solutions out there for windows hadoop client, I haven't tried any of them so, will defer to the community at large for specifics. One elegant/interesting approach that I will point out is the possibility of a flume agent for windows. 

 

Tried and true method:
Off the top of my head, you can use file transfer method of your choice to get the files from the windows machine to a linux machine (SFTP, Samba, etc), and then use your favorite HDFS loading command/process to get the files into HDFS (hdfs dfs -copyFromLocal, flume, etc.)