Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do I "PUT" files correctly into WebHDFS via Knox?

avatar
Contributor

According to the WebHDFS documentation (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE) I need to set two HTTP PUT requests, one to the namenode and one to the data node given by the first request. This works fine as long as I have access to these nodes.

How does a PUT work from outside a cluster, where everything between the HTTP client and the cluster is separated by a firewall except the one entrance point, which is Knox? Does it work at all?

EDIT: Now it works - here is what went wrong:

Just to explain, what my mistake was: I have full access to the cluster, which made me send the first request with Knox's internal IP address. Knox answered me providing an internal address of a data node. That would work for me, since I have full access, but wouldn't for others, who just see the Knox node from outside.

When using Knox with it's external IP address, the first request also returns that external IP address.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Apache Knox provides the same REST APIs for PUTting files into HDFS.

Please see the Apache docs for examples: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS+Examples

Note that the link will take you to examples of using the groovy scripting capabilities of Knox as well as examples of using curl to do the same things.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

this does not necessarily answer your question but you're using very old docs, please refer to the latest stable release docs for webhdfs api. @Stefan Kupstaitis-Dunkler. Here are webhdfs examples that suppose to work.

avatar
Expert Contributor

Apache Knox provides the same REST APIs for PUTting files into HDFS.

Please see the Apache docs for examples: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS+Examples

Note that the link will take you to examples of using the groovy scripting capabilities of Knox as well as examples of using curl to do the same things.

avatar
Contributor

Thx. @lmccay I just noticed, that it isn't a matter of age of the docs (it didn't change much), but a matter of how I used WebHDFS PUT. And sometimes asking the question helps solving a problem as well 😉 -> I edited my question to describe the problem that I had.

avatar
New Contributor

Hi there!
I'm in the same situation but my cluster is also kerberized.
I don't have full access to the cluster so if I provide a datanode from the response I get "Could not resolve host" error.
So I'm wondering  what do u mean by using external IP address?
Can u post here your requests?
Thanks a ton!

avatar
Community Manager

@Sergeii Welcome to the Cloudera Community!
As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: