Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Script to get files from HDFS to local OS "Automated process"

avatar
Master Mentor

End user needs to get files from HDFS. The process is :

End user --> Gateway box ( Look for file locally. If not there then talk to HDFS) --> HDFS --> copy file in gateway box

1 ACCEPTED SOLUTION

avatar

I highly recommend Knox's shell which uses a DSL for those operations http://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

Great way to programmatically interact with a cluster in a controlled and audited manner (e.g. simpler DSL and secured gateway endpoint, no need to open every node's port). BTW, it's a groovy DSL, which makes it trivial to run in any Java program.

View solution in original post

3 REPLIES 3

avatar

I highly recommend Knox's shell which uses a DSL for those operations http://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

Great way to programmatically interact with a cluster in a controlled and audited manner (e.g. simpler DSL and secured gateway endpoint, no need to open every node's port). BTW, it's a groovy DSL, which makes it trivial to run in any Java program.

avatar
Master Mentor

avatar
Guru

An option here would be to put a standard caching web-proxy in-front of webhdfs, or of course better webhdfs through Knox, and have it ignore cache-control headers.