Support Questions
Find answers, ask questions, and share your expertise

How do I "PUT" files correctly into WebHDFS via Knox?

Explorer

According to the WebHDFS documentation (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE) I need to set two HTTP PUT requests, one to the namenode and one to the data node given by the first request. This works fine as long as I have access to these nodes.

How does a PUT work from outside a cluster, where everything between the HTTP client and the cluster is separated by a firewall except the one entrance point, which is Knox? Does it work at all?

EDIT: Now it works - here is what went wrong:

Just to explain, what my mistake was: I have full access to the cluster, which made me send the first request with Knox's internal IP address. Knox answered me providing an internal address of a data node. That would work for me, since I have full access, but wouldn't for others, who just see the Knox node from outside.

When using Knox with it's external IP address, the first request also returns that external IP address.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How do I "PUT" files correctly into WebHDFS via Knox?

Contributor

Apache Knox provides the same REST APIs for PUTting files into HDFS.

Please see the Apache docs for examples: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS+Examples

Note that the link will take you to examples of using the groovy scripting capabilities of Knox as well as examples of using curl to do the same things.

View solution in original post

3 REPLIES 3

Re: How do I "PUT" files correctly into WebHDFS via Knox?

Mentor

this does not necessarily answer your question but you're using very old docs, please refer to the latest stable release docs for webhdfs api. @Stefan Kupstaitis-Dunkler. Here are webhdfs examples that suppose to work.

Re: How do I "PUT" files correctly into WebHDFS via Knox?

Contributor

Apache Knox provides the same REST APIs for PUTting files into HDFS.

Please see the Apache docs for examples: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS+Examples

Note that the link will take you to examples of using the groovy scripting capabilities of Knox as well as examples of using curl to do the same things.

View solution in original post

Re: How do I "PUT" files correctly into WebHDFS via Knox?

Explorer

Thx. @lmccay I just noticed, that it isn't a matter of age of the docs (it didn't change much), but a matter of how I used WebHDFS PUT. And sometimes asking the question helps solving a problem as well 😉 -> I edited my question to describe the problem that I had.