Alfredo Sauce - Hadoop HTTP, Kerberos and SPNEGO
Kerberos SPNEGO authentication for HTTP has been part of hadoop for some time now. On secure cluster many services use it to authenticate HTTP APIs and WEB UIs.
Setup and configuration can become a challenge as it involves many aspects, including: kerberos principals, keytabs, network and load balancers, remote users accessing via different browsers, different operative systems, etc.
In this article I will share how Kerberos SPNEGO authentication for HTTP works in hadoop.
Introduction
Kerberos SPNEGO authentication for HTTP was introduced to hadoop via HADOOP-7119. The implementation is based on a servlet filter that is configured to front all incoming HTTP requests to the application. If not valid hadoop.auth cookie is found, the servlet filter calls the KerberosAuthenticationHandler to perform kerberos authentication for the UserAgent request. Upon a successful kerberos authentication, servlet filter adds a signed cookie to the response so that following requests, as long as cookie is valid, are only authenticated via cookie and not via kerberos api.
Configuration
As far as configuration goes most hadoop services support the following properties with similar names:
authentication.kerberos.keytab=/etc/security/keytabs/spnego.service.keytab | Points to the location of the spnego keytab file |
authentication.kerberos.principal=HTTP/_HOST@REALM.COM | Contains the principal name |
authentication.kerberos.name.rules auth_to_local rules | Contains the auth to local rules |
Implementation details
Kerberos SPNEGO authentication often requires more than one interaction until authentication is successful and a valid cookie is issued. Here is the sequence diagram for a successful authentication.
Note: Sequence diagram HadoopAuthenticationFilter is actually an interface implemented by many hadoop services. For simplicity, instead of using a classname specific to any hadoop service I have kept the interface name.
Diagram steps:
final StringserverName = InetAddress.getByName(request.getServerName()).getCanonicalHostName();
Let’s break this down:
It’s important DNS reverse resolution is configured appropriately so that step 1 to 3 result in a valid FQDN.
4. serverName is used to search the hashmap loaded at initialization time for the right service principal name. If service principal is found this step completes successfully. You can see the following TRACE messages in the logs:
TRACE KerberosAuthenticationHandler:422 - SPNEGO with server principals:[HTTP/serverName@REALM.COM] for serverName
If no principal is found you will see the following (notice empty bracket):
TRACE KerberosAuthenticationHandler:422 - SPNEGO with server principals:[]for serverName
TRACE KerberosAuthenticationHandler:467 - SPNEGO initiated with server principal [HTTP/fqdn_of_server@REALM.COM] TRACE KerberosAuthenticationHandler:494 - SPNEGO completed for client principal [user@REALM.COM]
Advanced Setup with Load Balancer
Here is the list of things to check when configuring LB (Load Balancer) with Kerberos SPNEGO authentication for HTTP:
authentication.kerberos.principal=*
4. Load Balancer's FQDN will resolve to possibly multiple different IP addresses. From service application host reverse DNS lookup for these IP addresses must resolve back to the Load Balancer FQDN.
Here is an example:
Load balancer FQDN: elb.example.com
elf.example.com is mapped to 2 different internal IP addresses -> 192.168.1.10 and 192.168.1.15
Note: ping command issued multiple times helps to find out to what IP addresses the FQDN resolves to.
PING elf.example.com (192.168.1.10) 56(84) bytes of data. ping elf.example.com PING elf.example.com (192.168.1.15) 56(84) bytes of data.
Reverse resolution of IP 192.168.1.10 must be elb.example.com
Reverse resolution of IP 192.168.1.15 must be elb.example.com
You can use following java code to find out exactly how the serverName is being computed starting form Host:
import java.net.InetAddress; public class GetServerName { public static void main(String[] args) throws Exception { if(args.length != 1) { System.out.println("ERROR: Missing argument <Host>"); System.out.println("Use GetServerName <Host>."); } else { final String serverName = InetAddress.getByName(args[0]).getCanonicalHostName(); System.out.format("Server name for %s is %s\n", args[0], serverName); } } }
Remote Users - Browser Configuration
You should try to answer the following questions when configuring remote UserAgents:
Troubleshooting and DEBUG
Server Side
Your service log files are the place to check. To debug I recommend adding the following to your log4j
log4j.logger.org.apache.hadoop.security.authentication.server=TRACE
And for kerberos DEBUG you can also add the java argument -Dsun.security.krb5.debug=true
Client Side
I find very helpful to use curl command like this:
curl-iv --negotiate -u :-X GET 'http://URL'
With this configuration curl will display each interaction and headers involved. Here is an example:
Note: Greater than sign ( > ) indicates request from UserAgent to application. Less than sign ( < ) indicates response from application to UserAgent.
curl -iv --negotiate -u : -X GET 'http://oozielb.example.com:11000/oozie/' GET /oozie/ HTTP/1.1 > Host: oozielb.example.com:11000 > User-Agent: curl/7.54.0 > Accept: */* < HTTP/1.1 401 Unauthorized < Date: Wed, 21 Feb 2018 17:29:15 GMT < Content-Type: text/html;charset=utf-8 < Content-Length: 997 < Connection: keep-alive < Server: Apache-Coyote/1.1 < WWW-Authenticate: Negotiate < Set-Cookie: hadoop.auth=; Path=/; HttpOnly > GET /oozie/ HTTP/1.1 > Host: oozielb.example.com:11000 > Authorization: Negotiate YII....................This is the client kebreros token > User-Agent: curl/7.54.0 > Accept: */* < HTTP/1.1 200 OK < Date: Wed, 21 Feb 2018 17:29:15 GMT < Content-Type: text/html < Content-Length: 3754 < Connection: keep-alive < Server: Apache-Coyote/1.1 < Set-Cookie: hadoop.auth="u=falbani&p=falbani@EXAMPLE.COM&t=kerberos&e=1519270155204&s=6RmPzEYJR0nsF2i7TFk4S+lNydc="; Path=/; HttpOnly < Set-Cookie: JSESSIONID=254F8AA4060810E7545DEE95F2E6AB83; Path=/oozie < Continuation you will see the HTML WEB PAGE content
Article Title
If you are wondering about article title used you should review jira HADOOP-7119 😉
Thanks
Special thanks to @emattos and @Vipin Rathor that helped reviewing this article.