Support Questions
Find answers, ask questions, and share your expertise

Can we restrict users to use webhdfs via CURL command without kerberos?



Is there anyway where we can restrict users not to use REST API to get hdfs file's data without kerbros.

I tried it with knox but problem is users know NN URI and they can hit it to get file's content.

So is there anyway to protect it ?


@Saurabh Kumar, Hadoop's HTTP servers can be configured to require Kerberos authentication via SPNEGO. After enabling that, users would be required to run kinit before curl, and they would have to use the curl options to enable SPNEGO and saving and reusing session cookies, e.g.:
curl --negotiate -u : -b cookies.txt -c cookies.txt http://namenode:50070/webhdfs/v1/?op=LISTSTATUS

For details on how to configure this, please refer to the Apache documentation on HTTP Authentication.

Please note that enabling HTTP authentication is a separate configuration step from enabling Kerberos security in a cluster, as discussed in the documentation on Secure Mode. This means that even after enabling Kerberos security in a cluster, the HTTP servers will not demand user authentication by default. It would still be necessary to follow the separate steps for enabling HTTP authentication.

If an alternate form of authentication is required for browser clients, different from Kerberos via SPNEGO, then it's possible to write a custom plugin that implements any arbitrary logic that you need. Quoting the HTTP Authentication guide:

If a custom authentication mechanism is required for the HTTP web-consoles, it is possible to implement a plugin to support the alternate authentication mechanism (refer to Hadoop hadoop-auth for details on writing an AuthenticatorHandler).

The Hadoop Auth documentation and its linked pages provide more details. The Configuration page is particularly relevant for its discussion of AltKerberos Configuration. This shows how you can require Kerberos authentication for some clients, but delegate to an alternate authentication mechanisms for others (typically browsers).

AltKerberos still assumes that Kerberos is enabled. If you need completely custom logic, without any Kerberos dependency, then it probably requires looking at the AuthenticationFilterInitializer in Hadoop and using that as inspiration to write your own FilterInitializer, which injects its own custom filter.

Another potential option, depending on your exact requirements, is to enforce perimeter security via firewall rules that block access to the HTTP port, unless the packets originate from a particular set of hosts. Then, limit login access to that set of hosts to a specific set of authorized users.

@Chris Nauroth

Thanks for your suggestion.

But can we configure HTTP Authentication without having kerberos in our cluster ?

@Saurabh Kumar, thank you for the clarification. I misinterpreted your original question. I have edited my answer to add more details about another form of authentication that is supported, named AltKerberos, and also information on how you might go about injecting your own custom authentication filter if completely custom logic is required.


Thanks @Chris Nauroth. I will read it and will come back to you soon for nay further help.

@Saurabh Kumar

How about blocking inbound access to your NN from client?


@Rahul Pathak: It would be great if you can me any example.

@Saurabh Kumar

I am suggesting to block access using firewall.


Hi @Rahul Pathak,

How to block the access in firewall