Support Questions

Find answers, ask questions, and share your expertise

HDP 3.1: Kerberized pyspark connection to Hive (livy)

avatar
Contributor

Hi all,

After setting up a fresh kerberized HDP 3.1 cluster with Hive LLAP, Spark2 and Livy, we're having trouble connecting to Hive's database through Livy. Pyspark from shell works without the problem, but something breaks when using Livy.

1. Livy settings are Ambari default, with additionally specified jars and pyfiles for the HWC connector, spark.sql.hive.hiveserver2.jdbc.url and spark.security.credentials.hiveserver2.enabled true. These are enough for pyspark shell to work without problems.

2. Connection is made through the latest HWC connector described here, since apparantly this is the only one that works for Hive 3 and Spark2.

problem:

1. When spark.master is set to yarn client mode (See for example the comment here), the connector appends a principal "hive/_HOST@DOMAIN" and the connection returns GSS error - failing to find any Kerberos tgt (although, the ticket is there and livy has access to the hiveserver2).

2. When spark.master is set to yarn cluster mode, ";auth=delegationToken" is appended to the connection, where the error follows that "PLAIN" connection is made, where a kerberized one is expected.

Notes: tried various settings -- zookeeper jdbc links vs direct through port 10500, hive.doAs = true vs false, various principals, but nothing works.

Note2: everything works fine when connecting both through beeline (to hive at 10500 port) and through pyspark shell.

Note3: HWC Connection snippet (from examples):

from pyspark_llap import HiveWarehouseSession
hive = HiveWarehouseSession.session(spark).build()
hive.showDatabases().show(100) 

Any ideas?

Feel like some setting on Livy is missing, especially weird seeing that "failed to find any Kerberos tgt" - where is it looking for it and why doesn't it see the ticker from "kinit"?

@Geoffrey Shelton Okot @Hyukjin Kwon @Eric Wohlstadter

2 REPLIES 2

avatar
Explorer

Hi All,

Was there any solution for the above issue. I am facing similar problem.

I have made changes in livy to disbale connecting to hive (enableHiveSupport() to false), however, when I trying to submit via pyspark, it connecting to hive and failing with Kerberos error.

Tried to execute the same using spark and works as expected. 

 

Any thoughts?

 

Thanks

Vijith Vijayan

avatar
New Contributor

I'm seeing this same problem as well haven't found a solution.  It appears that the warehouse connector code doesn't pass a tgt, or doesn't trigger the code to get a delegation token when running in a jupyter notebook through livy2 to yarn cluster.  I can access the spark catalog in hive when I use the typical spark.sql approach through livy2 to the yarn cluster, but can't use warehouse connector.  The warehouse connector does work when I use spark-submit or pyspark.

 

Below you can see the difference in the requests that are received by hs2, the first is when using the hive warehouse connector through livy the second is when using hive warehouse connector through spark-submit

 

Here is the simple code I'm trying to run

from pyspark_llap.sql.session import HiveWarehouseSession
hive = HiveWarehouseSession.session(spark).build()
hive.showDatabases().show(100)

 

The cookie hive.server2.auth is NOT present in the request to hs2-interactive from jupyter to livy2 to yarn cluster

POST //xx.xx.com:10501/cliservice HTTP/1.1
Content-Type: application/x-thrift
Accept: application/x-thrift
User-Agent: Java/THttpClient/HC
Authorization: Basic YW5vbnltb3VzOnBhc3N3b3Jk
Cookie: 
Content-Length: 144
Host: xx.xx.com:10501
Connection: keep-alive
Accept-Encoding: gzip,deflate
X-XSRF-HEADER: true

This error is displayed in the jupyter notebook

py4j.protocol.Py4JJavaError: An error occurred while calling o105.showDatabases.
: java.lang.RuntimeException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Could not open client transport for any of the Server URI's in ZooKeeper: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401)
	at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.executeInternal(HiveWarehouseSessionImpl.java:200)
	at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.executeSmart(HiveWarehouseSessionImpl.java:189)
	at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.execute(HiveWarehouseSessionImpl.java:182)
	at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.showDatabases(HiveWarehouseSessionImpl.java:257)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Cannot create PoolableConnectionFactory (Could not open client transport for any of the Server URI's in ZooKeeper: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401)
	at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2385)
	at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:2110)
	at org.apache.commons.dbcp2.BasicDataSource.getLogWriter(BasicDataSource.java:1622)
	at org.apache.commons.dbcp2.BasicDataSourceFactory.createDataSource(BasicDataSourceFactory.java:554)
	at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.getConnector(HS2JDBCWrapper.scala:433)
	at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.getConnector(HS2JDBCWrapper.scala:440)
	at com.hortonworks.spark.sql.hive.llap.DefaultJDBCWrapper.getConnector(HS2JDBCWrapper.scala)
	at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.lambda$new$0(HiveWarehouseSessionImpl.java:86)
	at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.executeInternal(HiveWarehouseSessionImpl.java:196)
	... 14 more
Caused by: java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401
	at shadehive.org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:344)
	at shadehive.org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
	at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:53)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:291)
	at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:2395)
	at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2381)
	... 22 more
Caused by: java.sql.SQLException: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401
	at shadehive.org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:872)
	at shadehive.org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:316)
	... 27 more
Caused by: org.apache.thrift.transport.TTransportException: HTTP Response code: 401
	at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:262)
	at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:316)
	at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73)
	at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
	at shadehive.org.apache.hive.service.rpc.thrift.TCLIService$Client.send_OpenSession(TCLIService.java:170)
	at shadehive.org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:162)
	at shadehive.org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:853)
	... 28 more

This is the error that I see in the hiveserver2Interactive.log

2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: servlet.ServletHandler (ServletHandler.java:doScope(499)) - servlet |/cliservice|null -> org.apache.hive.service.cli.thrift.ThriftHttpServlet-4b8a0d03@3497805d==org.apache.hive.service.cli.thrift.ThriftHttpServlet,jsp=null,order=-1,inst=true
2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: servlet.ServletHandler (ServletHandler.java:doHandle(562)) - chain=null
2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:validateCookie(308)) - No valid cookies associated with the request Request(POST //xx.xx.xx.com:10501/cliservice)@612b2fe8
2020-02-28T09:29:56,914 INFO  [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(147)) - Could not validate cookie sent, will try to generate a new cookie
2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:logPrivilegedAction(1757)) - PrivilegedAction as:HTTP/xx.xx.com@DEV.COM (auth:KERBEROS) from:org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:402)
2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:doAs(1734)) - PrivilegedActionException as:HTTP/xx.xx.com@DEV.COM (auth:KERBEROS) cause:org.apache.hive.service.auth.HttpAuthenticationException: Kerberos authentication failed: 
2020-02-28T09:29:56,915 INFO  [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doKerberosAuth(404)) - Failed to authenticate with http/_HOST kerberos principal, trying with hive/_HOST kerberos principal
2020-02-28T09:29:56,915 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:logPrivilegedAction(1757)) - PrivilegedAction as:hive/xx.xx.com@DEV.COM (auth:KERBEROS) from:org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410)
2020-02-28T09:29:56,915 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:doAs(1734)) - PrivilegedActionException as:hive/xx.xx.com@DEV.COM (auth:KERBEROS) cause:org.apache.hive.service.auth.HttpAuthenticationException: Kerberos authentication failed: 
2020-02-28T09:29:56,915 ERROR [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doKerberosAuth(415)) - Failed to authenticate with hive/_HOST kerberos principal
2020-02-28T09:29:56,915 ERROR [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(214)) - Error: 
org.apache.hive.service.auth.HttpAuthenticationException: java.lang.reflect.UndeclaredThrowableException
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:416) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:161) [hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) [javax.servlet-api-3.1.0.jar:3.1.0]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.Server.handle(Server.java:539) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?]
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        ... 25 more
Caused by: org.apache.hive.service.auth.HttpAuthenticationException: Kerberos authentication failed: 
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:472) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:421) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?]
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        ... 25 more
Caused by: org.ietf.jgss.GSSException: Defective token detected (Mechanism level: GSSHeader did not find the right tag)
        at sun.security.jgss.GSSHeader.<init>(GSSHeader.java:97) ~[?:1.8.0_112]
        at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:306) ~[?:1.8.0_112]
        at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) ~[?:1.8.0_112]
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:460) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:421) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?]
at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] ... 25 more

The cookie hive.server2.auth is present in the request to hs2-interactive from pyspark to yarn cluster

POST //xx.xx.com:10501/cliservice HTTP/1.1
Content-Type: application/x-thrift
Accept: application/x-thrift
User-Agent: Java/THttpClient/HC
Authorization: Basic YW5vbnltb3VzOnBhc3N3b3Jk
Cookie: 
Content-Length: 144
Host: xx.xx.com:10501
Connection: keep-alive
Cookie: hive.server2.auth=cu=testuser&rn=6456929925324967245&s=keyCVNSqTtfEzHwqP9kLvgpGzWlFIp+G1t2LfNHBy+s= Accept-Encoding: gzip,deflate X-XSRF-HEADER: true