Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Script to get files from HDFS to local OS "Automated process"

avatar
Master Mentor

End user needs to get files from HDFS. The process is :

End user --> Gateway box ( Look for file locally. If not there then talk to HDFS) --> HDFS --> copy file in gateway box

1 ACCEPTED SOLUTION

avatar

I highly recommend Knox's shell which uses a DSL for those operations http://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

Great way to programmatically interact with a cluster in a controlled and audited manner (e.g. simpler DSL and secured gateway endpoint, no need to open every node's port). BTW, it's a groovy DSL, which makes it trivial to run in any Java program.

View solution in original post

3 REPLIES 3

avatar

I highly recommend Knox's shell which uses a DSL for those operations http://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

Great way to programmatically interact with a cluster in a controlled and audited manner (e.g. simpler DSL and secured gateway endpoint, no need to open every node's port). BTW, it's a groovy DSL, which makes it trivial to run in any Java program.

avatar
Master Mentor

avatar
Guru

An option here would be to put a standard caching web-proxy in-front of webhdfs, or of course better webhdfs through Knox, and have it ignore cache-control headers.