Created 01-17-2016 09:37 PM
According to the WebHDFS documentation (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE) I need to set two HTTP PUT requests, one to the namenode and one to the data node given by the first request. This works fine as long as I have access to these nodes.
How does a PUT work from outside a cluster, where everything between the HTTP client and the cluster is separated by a firewall except the one entrance point, which is Knox? Does it work at all?
EDIT: Now it works - here is what went wrong:
Just to explain, what my mistake was: I have full access to the cluster, which made me send the first request with Knox's internal IP address. Knox answered me providing an internal address of a data node. That would work for me, since I have full access, but wouldn't for others, who just see the Knox node from outside.
When using Knox with it's external IP address, the first request also returns that external IP address.
Created 01-18-2016 03:44 AM
Apache Knox provides the same REST APIs for PUTting files into HDFS.
Please see the Apache docs for examples: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS+Examples
Note that the link will take you to examples of using the groovy scripting capabilities of Knox as well as examples of using curl to do the same things.
Created 01-18-2016 01:49 AM
this does not necessarily answer your question but you're using very old docs, please refer to the latest stable release docs for webhdfs api. @Stefan Kupstaitis-Dunkler. Here are webhdfs examples that suppose to work.
Created 01-18-2016 03:44 AM
Apache Knox provides the same REST APIs for PUTting files into HDFS.
Please see the Apache docs for examples: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS+Examples
Note that the link will take you to examples of using the groovy scripting capabilities of Knox as well as examples of using curl to do the same things.
Created 01-18-2016 08:02 AM
Thx. @lmccay I just noticed, that it isn't a matter of age of the docs (it didn't change much), but a matter of how I used WebHDFS PUT. And sometimes asking the question helps solving a problem as well 😉 -> I edited my question to describe the problem that I had.
Created 11-07-2023 10:58 AM
Hi there!
I'm in the same situation but my cluster is also kerberized.
I don't have full access to the cluster so if I provide a datanode from the response I get "Could not resolve host" error.
So I'm wondering what do u mean by using external IP address?
Can u post here your requests?
Thanks a ton!
Created 11-07-2023 11:59 AM
@Sergeii Welcome to the Cloudera Community!
As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.
Regards,
Diana Torres,