Is there anyway where we can restrict users not to use REST API to get hdfs file's data without kerbros.
I tried it with knox but problem is users know NN URI and they can hit it to get file's content.
So is there anyway to protect it ?
curl --negotiate -u : -b cookies.txt -c cookies.txt http://namenode:50070/webhdfs/v1/?op=LISTSTATUS
For details on how to configure this, please refer to the Apache documentation on HTTP Authentication.
Please note that enabling HTTP authentication is a separate configuration step from enabling Kerberos security in a cluster, as discussed in the documentation on Secure Mode. This means that even after enabling Kerberos security in a cluster, the HTTP servers will not demand user authentication by default. It would still be necessary to follow the separate steps for enabling HTTP authentication.
If an alternate form of authentication is required for browser clients, different from Kerberos via SPNEGO, then it's possible to write a custom plugin that implements any arbitrary logic that you need. Quoting the HTTP Authentication guide:
If a custom authentication mechanism is required for the HTTP web-consoles, it is possible to implement a plugin to support the alternate authentication mechanism (refer to Hadoop hadoop-auth for details on writing an AuthenticatorHandler).
The Hadoop Auth documentation and its linked pages provide more details. The Configuration page is particularly relevant for its discussion of AltKerberos Configuration. This shows how you can require Kerberos authentication for some clients, but delegate to an alternate authentication mechanisms for others (typically browsers).
AltKerberos still assumes that Kerberos is enabled. If you need completely custom logic, without any Kerberos dependency, then it probably requires looking at the AuthenticationFilterInitializer in Hadoop and using that as inspiration to write your own FilterInitializer, which injects its own custom filter.
Another potential option, depending on your exact requirements, is to enforce perimeter security via firewall rules that block access to the HTTP port, unless the packets originate from a particular set of hosts. Then, limit login access to that set of hosts to a specific set of authorized users.