Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

Hadoop Auth [1] is a Java library which enables Kerberos SPNEGO authentication for HTTP requests. It enforces authentication on protected resources, after successful authentication Hadoop Auth creates a signed HTTP Cookie with an authentication token, username, user principal, authentication type and expiration time. This cookie is used for all subsequent HTTP client requests to access a protected resource until the cookie expires.

Given Apache Knox's pluggable authentication providers it is easy to setup Hadoop Auth with Apache Knox with only few configuration changes. The purpose of this article to describe this process in detail and with examples.

Assumptions:

Here we are assuming that we have a working Hadoop cluster with Apache Knox ( version 0.7.0 and up [2] ) moreover the cluster is Kerberized. Kerberizing the cluster is beyond the scope of this article.

Setup:

To use Hadoop Auth in Apache Knox we need to update the Knox topology. Hadoop Auth is configured as a provider so we need to configure it through the provider params. Apache Knox uses the same configuration parameters used by Apache Hadoop and they can be expected to behave in similar fashion. To update the Knox topology using Ambari go to Knox -> Configs -> Advanced topology.

Following is an example of the HadoopAuth provider snippet in the Apache Knox topology file

               <provider>
                  <role>authentication</role>
                  <name>HadoopAuth</name>
                  <enabled>true</enabled>
                  <param>
                    <name>config.prefix</name>
                    <value>hadoop.auth.config</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.signature.secret</name>
                    <value>my-seceret-key</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.type</name>
                    <value>kerberos</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.simple.anonymous.allowed</name>
                    <value>false</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.token.validity</name>
                    <value>1800</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.cookie.domain</name>
                    <value>ambari.apache.org</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.cookie.path</name>
                    <value>gateway/default</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.kerberos.principal</name>
                    <value>HTTP/c6401.ambari.apache.org@EXAMPLE.COM</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.kerberos.keytab</name>
                    <value>/etc/security/keytabs/spnego.service.keytab</value>
                  </param>
                  <param>
                    <name>hadoop.auth.config.kerberos.name.rules</name>
                    <value>DEFAULT</value>
                  </param>
                </provider>

Following are the parameters that needs to be updated at minimum:

  1. hadoop.auth.config.signature.secret - This is the secret used to sign the delegation token in the hadoop.auth cookie. This same secret needs to be used across all instances of the Knox gateway in a given cluster. Otherwise, the delegation token will fail validation and authentication will be repeated each request.
  2. cookie.domain - domain to use for the HTTP cookie that stores the authentication token (e.g. mycompany.com)
  3. hadoop.auth.config.kerberos.principal - The web-application Kerberos principal name. The Kerberos principal name must start with HTTP/….
  4. hadoop.auth.config.kerberos.keytab - The path to the keytab file containing the credentials for the kerberos principal specified above.

For details on the other properties please refer to the Apache Knox documentation [3]

If you are using Ambari you will have to restart Knox, this is an Ambari requirement, no restart is required if topology is updated outside of Ambari (Apache Knox reloads the topology every time the topology time-stamp is updated).

Testing:

For testing Hadoop Auth we will test with user 'guest', we are assuming that no such user exists on the system.

1. Let's create a user 'guest' with group 'users'. Note that the group users was chosen because of the property 'hadoop.proxyuser.knox.groups=users'

useradd guest -u 1590 -g users

2. Add principal using 'kadmin.local'

kadmin.local -q "addprinc guest/c6401.ambari.apache.org”

3. Login using kinit

kinit guest/c6401.ambari.apache.org@EXAMPLE.COM

4. Test by sending a curl request through Knox

curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS”

You should see output similar to

# curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS"
HTTP/1.1 401 Authentication required
Date: Fri, 24 Feb 2017 14:19:25 GMT
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly
Content-Type: text/html; charset=ISO-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 320
Server: Jetty(9.2.15.v20160210)

HTTP/1.1 200 OK
Date: Fri, 24 Feb 2017 14:19:25 GMT
Set-Cookie: hadoop.auth="u=guest&p=guest/c6401.ambari.apache.org@EXAMPLE.COM&t=kerberos&e=1487947765114&s=fNpq9FYy2DA19Rah7586rgsAieI="; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly
Cache-Control: no-cache
Expires: Fri, 24 Feb 2017 14:19:25 GMT
Date: Fri, 24 Feb 2017 14:19:25 GMT
Pragma: no-cache
Expires: Fri, 24 Feb 2017 14:19:25 GMT
Date: Fri, 24 Feb 2017 14:19:25 GMT
Pragma: no-cache
Content-Type: application/json; charset=UTF-8
X-FRAME-OPTIONS: SAMEORIGIN
Server: Jetty(6.1.26.hwx)
Content-Length: 276

{"FileStatuses":{"FileStatus":[{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16398,"group":"hdfs","length":0,"modificationTime":1487855904191,"owner":"hdfs","pathSuffix":"entity-file-history","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"}]}}

[1] https://hadoop.apache.org/docs/stable/hadoop-auth/index.html

[2] https://issues.apache.org/jira/browse/KNOX-25

[3] Apache Knox documentation on Hadoop Auth https://knox.apache.org/books/knox-0-11-0/user-guide.html#HadoopAuth+Authentication+Provider

6,651 Views
Comments

This is a great article, Sandeep!

Thanks !!

This is really nice feature to have given the rising security concerns recently. Nicely illustrated.

Thanks Krishna, hoping it helps folks to understand the configuration a bit better !

avatar
Expert Contributor

@Sandeep More

I'm getting below error on HDP 2.6.1, please see if you can help me. I'm able to browse "webhdfs" without knox.

curl -k -i --negotiate -u : https://myhost.mydomain:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS
HTTP/1.1 401 Authentication required
Date: Tue, 10 Oct 2017 20:41:13 GMT
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=mydomain.com; Secure; HttpOnly
Content-Type: text/html; charset=ISO-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 320
Server: Jetty(9.2.15.v20160210)

HTTP/1.1 403 org.apache.hadoop.security.authentication.client.AuthenticationException
Date: Tue, 10 Oct 2017 20:41:14 GMT
Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=mydomain.com; Secure; HttpOnly
Content-Type: text/html; charset=ISO-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 314
Server: Jetty(9.2.15.v20160210)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 403 Forbidden</title>
</head>
<body><h2>HTTP ERROR 403</h2>
<p>Problem accessing /gateway/default/webhdfs/v1/tmp. Reason:
<pre>    Forbidden</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html><br>
     	<provider>
		  <role>authentication</role>
		  <name>HadoopAuth</name>
		  <enabled>true</enabled>
		  <param>
			<name>config.prefix</name>
			<value>hadoop.auth.config</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.signature.secret</name>
			<value>/etc/security/http_secret</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.type</name>
			<value>kerberos</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.simple.anonymous.allowed</name>
			<value>false</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.token.validity</name>
			<value>1800</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.cookie.domain</name>
			<value>mydomain.com</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.cookie.path</name>
			<value>gateway/default</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.kerberos.principal</name>
			<value>HTTP/_HOST@TEST.COM</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.kerberos.keytab</name>
			<value>/etc/security/keytabs/spnego.service.keytab</value>
		  </param>
		  <param>
			<name>hadoop.auth.config.kerberos.name.rules</name>
			<value>DEFAULT</value>
		  </param>
		</provider><br>

Did you Login using kinit ?

avatar
Expert Contributor

@Sandeep More

I logged in and ran kinit. So have valid ticket and able to run other hdfs commands.