Created on 02-24-2017 03:41 PM - edited 09-16-2022 01:38 AM
Hadoop Auth [1] is a Java library which enables Kerberos SPNEGO authentication for HTTP requests. It enforces authentication on protected resources, after successful authentication Hadoop Auth creates a signed HTTP Cookie with an authentication token, username, user principal, authentication type and expiration time. This cookie is used for all subsequent HTTP client requests to access a protected resource until the cookie expires.
Given Apache Knox's pluggable authentication providers it is easy to setup Hadoop Auth with Apache Knox with only few configuration changes. The purpose of this article to describe this process in detail and with examples.
Assumptions:
Here we are assuming that we have a working Hadoop cluster with Apache Knox ( version 0.7.0 and up [2] ) moreover the cluster is Kerberized. Kerberizing the cluster is beyond the scope of this article.
Setup:
To use Hadoop Auth in Apache Knox we need to update the Knox topology. Hadoop Auth is configured as a provider so we need to configure it through the provider params. Apache Knox uses the same configuration parameters used by Apache Hadoop and they can be expected to behave in similar fashion. To update the Knox topology using Ambari go to Knox -> Configs -> Advanced topology.
Following is an example of the HadoopAuth provider snippet in the Apache Knox topology file
<provider> <role>authentication</role> <name>HadoopAuth</name> <enabled>true</enabled> <param> <name>config.prefix</name> <value>hadoop.auth.config</value> </param> <param> <name>hadoop.auth.config.signature.secret</name> <value>my-seceret-key</value> </param> <param> <name>hadoop.auth.config.type</name> <value>kerberos</value> </param> <param> <name>hadoop.auth.config.simple.anonymous.allowed</name> <value>false</value> </param> <param> <name>hadoop.auth.config.token.validity</name> <value>1800</value> </param> <param> <name>hadoop.auth.config.cookie.domain</name> <value>ambari.apache.org</value> </param> <param> <name>hadoop.auth.config.cookie.path</name> <value>gateway/default</value> </param> <param> <name>hadoop.auth.config.kerberos.principal</name> <value>HTTP/c6401.ambari.apache.org@EXAMPLE.COM</value> </param> <param> <name>hadoop.auth.config.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> </param> <param> <name>hadoop.auth.config.kerberos.name.rules</name> <value>DEFAULT</value> </param> </provider>
Following are the parameters that needs to be updated at minimum:
For details on the other properties please refer to the Apache Knox documentation [3]
If you are using Ambari you will have to restart Knox, this is an Ambari requirement, no restart is required if topology is updated outside of Ambari (Apache Knox reloads the topology every time the topology time-stamp is updated).
Testing:
For testing Hadoop Auth we will test with user 'guest', we are assuming that no such user exists on the system.
1. Let's create a user 'guest' with group 'users'. Note that the group users was chosen because of the property 'hadoop.proxyuser.knox.groups=users'
useradd guest -u 1590 -g users
2. Add principal using 'kadmin.local'
kadmin.local -q "addprinc guest/c6401.ambari.apache.org”
3. Login using kinit
kinit guest/c6401.ambari.apache.org@EXAMPLE.COM
4. Test by sending a curl request through Knox
curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS”
You should see output similar to
# curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS" HTTP/1.1 401 Authentication required Date: Fri, 24 Feb 2017 14:19:25 GMT WWW-Authenticate: Negotiate Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly Content-Type: text/html; charset=ISO-8859-1 Cache-Control: must-revalidate,no-cache,no-store Content-Length: 320 Server: Jetty(9.2.15.v20160210) HTTP/1.1 200 OK Date: Fri, 24 Feb 2017 14:19:25 GMT Set-Cookie: hadoop.auth="u=guest&p=guest/c6401.ambari.apache.org@EXAMPLE.COM&t=kerberos&e=1487947765114&s=fNpq9FYy2DA19Rah7586rgsAieI="; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly Cache-Control: no-cache Expires: Fri, 24 Feb 2017 14:19:25 GMT Date: Fri, 24 Feb 2017 14:19:25 GMT Pragma: no-cache Expires: Fri, 24 Feb 2017 14:19:25 GMT Date: Fri, 24 Feb 2017 14:19:25 GMT Pragma: no-cache Content-Type: application/json; charset=UTF-8 X-FRAME-OPTIONS: SAMEORIGIN Server: Jetty(6.1.26.hwx) Content-Length: 276 {"FileStatuses":{"FileStatus":[{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16398,"group":"hdfs","length":0,"modificationTime":1487855904191,"owner":"hdfs","pathSuffix":"entity-file-history","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"}]}}
[1] https://hadoop.apache.org/docs/stable/hadoop-auth/index.html
[2] https://issues.apache.org/jira/browse/KNOX-25
[3] Apache Knox documentation on Hadoop Auth https://knox.apache.org/books/knox-0-11-0/user-guide.html#HadoopAuth+Authentication+Provider
Created on 02-24-2017 04:46 PM
This is a great article, Sandeep!
Created on 02-24-2017 06:33 PM
Thanks !!
Created on 02-25-2017 06:24 PM
This is really nice feature to have given the rising security concerns recently. Nicely illustrated.
Created on 02-27-2017 02:01 PM
Thanks Krishna, hoping it helps folks to understand the configuration a bit better !
Created on 10-10-2017 08:55 PM
I'm getting below error on HDP 2.6.1, please see if you can help me. I'm able to browse "webhdfs" without knox.
curl -k -i --negotiate -u : https://myhost.mydomain:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS HTTP/1.1 401 Authentication required Date: Tue, 10 Oct 2017 20:41:13 GMT WWW-Authenticate: Negotiate Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=mydomain.com; Secure; HttpOnly Content-Type: text/html; charset=ISO-8859-1 Cache-Control: must-revalidate,no-cache,no-store Content-Length: 320 Server: Jetty(9.2.15.v20160210) HTTP/1.1 403 org.apache.hadoop.security.authentication.client.AuthenticationException Date: Tue, 10 Oct 2017 20:41:14 GMT Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=mydomain.com; Secure; HttpOnly Content-Type: text/html; charset=ISO-8859-1 Cache-Control: must-revalidate,no-cache,no-store Content-Length: 314 Server: Jetty(9.2.15.v20160210) <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <title>Error 403 Forbidden</title> </head> <body><h2>HTTP ERROR 403</h2> <p>Problem accessing /gateway/default/webhdfs/v1/tmp. Reason: <pre> Forbidden</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/> </body> </html><br>
<provider> <role>authentication</role> <name>HadoopAuth</name> <enabled>true</enabled> <param> <name>config.prefix</name> <value>hadoop.auth.config</value> </param> <param> <name>hadoop.auth.config.signature.secret</name> <value>/etc/security/http_secret</value> </param> <param> <name>hadoop.auth.config.type</name> <value>kerberos</value> </param> <param> <name>hadoop.auth.config.simple.anonymous.allowed</name> <value>false</value> </param> <param> <name>hadoop.auth.config.token.validity</name> <value>1800</value> </param> <param> <name>hadoop.auth.config.cookie.domain</name> <value>mydomain.com</value> </param> <param> <name>hadoop.auth.config.cookie.path</name> <value>gateway/default</value> </param> <param> <name>hadoop.auth.config.kerberos.principal</name> <value>HTTP/_HOST@TEST.COM</value> </param> <param> <name>hadoop.auth.config.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> </param> <param> <name>hadoop.auth.config.kerberos.name.rules</name> <value>DEFAULT</value> </param> </provider><br>
Created on 10-11-2017 06:40 PM
Did you Login using kinit ?
Created on 10-12-2017 07:02 PM
I logged in and ran kinit. So have valid ticket and able to run other hdfs commands.