Member since
08-16-2019
38
Posts
8
Kudos Received
4
Solutions
01-14-2021
08:42 AM
Starting CDP 7.2.7, Knox can be configured to load-balance backend service instances as well as support sticky sessions in HA mode. Prior to this when HA was configured for Knox (using the HaProvider in Knox topology) Knox would route requests to a specific backend service, except in case of failover where Knox would pick the next backend service from the configured list. This approach had some limitations, Knox would only serve one backend, even when there were multiple backends available. This especially was a problem for stateless services that needed even load distribution. Knox also had no support for sticky sessions. With this release, Knox can be configured to load-balance requests for multiple backend service instances configured with HaProvider. Currently, load-balancing is done in a simple round-robin fashion. Following is an example of how to configure only load-balancing for an example HA service WHOAMI <provider>
<role>ha</role>
<name>HaProvider</name>
<enabled>true</enabled>
<param>
<name>WHOAMI</name>
<value>enabled=true;maxFailoverAttempts=3;failoverSleep=1000;enableLoadBalancing=true</value>
</param>
</provider>
...
<service>
<role>WHOAMI</role>
<url>http://localhost:50070</url>
<url>http://localhost:50071</url>
</service> Sticky sessions can be used for services such as Hive where session state matters. Sticky sessions can be turned on by using the property enableStickySession. When sticky sessions are on, Knox uses a cookie to match the client requests with the backend host, and as a result, cookies are required for this feature. Cookie name used for sticky session is KNOX_BACKEND-{service-name}. Sticky session cookie name can be changed by using the property stickySessionCookieName. With sticky session turned on, loadbalancing is turned on automatically. If sticky session is turned on and there is a failover, Knox will choose a new backend and route the request to it. This can be undesirable in cases where session state is critical. Knox can be configured to not failover when sticky sessions are used by using the flag noFallback. When this flag (noFallback=true) is set, Knox will return a 502 (Bad Gateway) if a request comes with a cookie and the corresponding backend is unavailable. Example of sticky session configuration: <provider>
<role>ha</role>
<name>HaProvider</name>
<enabled>true</enabled>
<param>
<name>WHOAMI</name>
<value>enabled=true;maxFailoverAttempts=3;failoverSleep=1000;enableStickySession=true;noFallback=true</value>
</param>
</provider> Summary of HaProvider properties discussed above: enableStickySession - Enables sticky sessions stickySessionCookieName - Customize sticky session cookie name, default is KNOX_BACKEND-{serviceName} noFallback - When this flag is set, Knox will return a 502 (Bad Gateway) if a request comes with a cookie and the corresponding backend is unavailable enableLoadBalancing - Enable loadbalancing, roundrobin requests to backend HA urls
... View more
Labels:
10-11-2017
06:40 PM
Did you Login using kinit ?
... View more
02-27-2017
02:01 PM
Thanks Krishna, hoping it helps folks to understand the configuration a bit better !
... View more
02-24-2017
03:41 PM
3 Kudos
Hadoop Auth [1] is a Java library which enables Kerberos SPNEGO authentication for HTTP requests. It enforces authentication on protected resources, after successful authentication Hadoop Auth creates a signed HTTP Cookie with an authentication token, username, user principal, authentication type and expiration time. This cookie is used for all subsequent HTTP client requests to access a protected resource until the cookie expires. Given Apache Knox's pluggable authentication providers it is easy to setup Hadoop Auth with Apache Knox with only few configuration changes. The purpose of this article to describe this process in detail and with examples.
Assumptions: Here we are assuming that we have a working Hadoop cluster with Apache Knox ( version 0.7.0 and up [2] ) moreover the cluster is Kerberized. Kerberizing the cluster is beyond the scope of this article.
Setup: To use Hadoop Auth in Apache Knox we need to update the Knox topology. Hadoop Auth is configured as a provider so we need to configure it through the provider params. Apache Knox uses the same configuration parameters used by Apache Hadoop and they can be expected to behave in similar fashion. To update the Knox topology using Ambari go to Knox -> Configs -> Advanced topology. Following is an example of the HadoopAuth provider snippet in the Apache Knox topology file <provider>
<role>authentication</role>
<name>HadoopAuth</name>
<enabled>true</enabled>
<param>
<name>config.prefix</name>
<value>hadoop.auth.config</value>
</param>
<param>
<name>hadoop.auth.config.signature.secret</name>
<value>my-seceret-key</value>
</param>
<param>
<name>hadoop.auth.config.type</name>
<value>kerberos</value>
</param>
<param>
<name>hadoop.auth.config.simple.anonymous.allowed</name>
<value>false</value>
</param>
<param>
<name>hadoop.auth.config.token.validity</name>
<value>1800</value>
</param>
<param>
<name>hadoop.auth.config.cookie.domain</name>
<value>ambari.apache.org</value>
</param>
<param>
<name>hadoop.auth.config.cookie.path</name>
<value>gateway/default</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.principal</name>
<value>HTTP/c6401.ambari.apache.org@EXAMPLE.COM</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.keytab</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.name.rules</name>
<value>DEFAULT</value>
</param>
</provider> Following are the parameters that needs to be updated at minimum: hadoop.auth.config.signature.secret - This is the secret used to sign the delegation token in the hadoop.auth cookie. This same secret needs to be used across all instances of the Knox gateway in a given cluster. Otherwise, the delegation token will fail validation and authentication will be repeated each request. cookie.domain - domain to use for the HTTP cookie that stores the authentication token (e.g. mycompany.com) hadoop.auth.config.kerberos.principal - The web-application Kerberos principal name. The Kerberos principal name must start with HTTP/…. hadoop.auth.config.kerberos.keytab - The path to the keytab file containing the credentials for the kerberos principal specified above. For details on the other properties please refer to the Apache Knox documentation [3] If you are using Ambari you will have to restart Knox, this is an Ambari requirement, no restart is required if topology is updated outside of Ambari (Apache Knox reloads the topology every time the topology time-stamp is updated).
Testing: For testing Hadoop Auth we will test with user 'guest', we are assuming that no such user exists on the system.
1. Let's create a user 'guest' with group 'users'. Note that the group users was chosen because of the property 'hadoop.proxyuser.knox.groups=users' useradd guest -u 1590 -g users
2. Add principal using 'kadmin.local' kadmin.local -q "addprinc guest/c6401.ambari.apache.org” 3. Login using kinit kinit guest/c6401.ambari.apache.org@EXAMPLE.COM 4. Test by sending a curl request through Knox curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS”
You should see output similar to # curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS"
HTTP/1.1 401 Authentication required
Date: Fri, 24 Feb 2017 14:19:25 GMT
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly
Content-Type: text/html; charset=ISO-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 320
Server: Jetty(9.2.15.v20160210)
HTTP/1.1 200 OK
Date: Fri, 24 Feb 2017 14:19:25 GMT
Set-Cookie: hadoop.auth="u=guest&p=guest/c6401.ambari.apache.org@EXAMPLE.COM&t=kerberos&e=1487947765114&s=fNpq9FYy2DA19Rah7586rgsAieI="; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly
Cache-Control: no-cache
Expires: Fri, 24 Feb 2017 14:19:25 GMT
Date: Fri, 24 Feb 2017 14:19:25 GMT
Pragma: no-cache
Expires: Fri, 24 Feb 2017 14:19:25 GMT
Date: Fri, 24 Feb 2017 14:19:25 GMT
Pragma: no-cache
Content-Type: application/json; charset=UTF-8
X-FRAME-OPTIONS: SAMEORIGIN
Server: Jetty(6.1.26.hwx)
Content-Length: 276
{"FileStatuses":{"FileStatus":[{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16398,"group":"hdfs","length":0,"modificationTime":1487855904191,"owner":"hdfs","pathSuffix":"entity-file-history","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"}]}} [1] https://hadoop.apache.org/docs/stable/hadoop-auth/index.html [2] https://issues.apache.org/jira/browse/KNOX-25 [3] Apache Knox documentation on Hadoop Auth https://knox.apache.org/books/knox-0-11-0/user-guide.html#HadoopAuth+Authentication+Provider
... View more