Created 02-27-2019 09:53 AM
Hi all,
After setting up a fresh kerberized HDP 3.1 cluster with Hive LLAP, Spark2 and Livy, we're having trouble connecting to Hive's database through Livy. Pyspark from shell works without the problem, but something breaks when using Livy.
1. Livy settings are Ambari default, with additionally specified jars and pyfiles for the HWC connector, spark.sql.hive.hiveserver2.jdbc.url and spark.security.credentials.hiveserver2.enabled true. These are enough for pyspark shell to work without problems.
2. Connection is made through the latest HWC connector described here, since apparantly this is the only one that works for Hive 3 and Spark2.
problem:
1. When spark.master is set to yarn client mode (See for example the comment here), the connector appends a principal "hive/_HOST@DOMAIN" and the connection returns GSS error - failing to find any Kerberos tgt (although, the ticket is there and livy has access to the hiveserver2).
2. When spark.master is set to yarn cluster mode, ";auth=delegationToken" is appended to the connection, where the error follows that "PLAIN" connection is made, where a kerberized one is expected.
Notes: tried various settings -- zookeeper jdbc links vs direct through port 10500, hive.doAs = true vs false, various principals, but nothing works.
Note2: everything works fine when connecting both through beeline (to hive at 10500 port) and through pyspark shell.
Note3: HWC Connection snippet (from examples):
from pyspark_llap import HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build() hive.showDatabases().show(100)
Any ideas?
Feel like some setting on Livy is missing, especially weird seeing that "failed to find any Kerberos tgt" - where is it looking for it and why doesn't it see the ticker from "kinit"?
Created 01-12-2020 12:02 PM
Hi All,
Was there any solution for the above issue. I am facing similar problem.
I have made changes in livy to disbale connecting to hive (enableHiveSupport() to false), however, when I trying to submit via pyspark, it connecting to hive and failing with Kerberos error.
Tried to execute the same using spark and works as expected.
Any thoughts?
Thanks
Vijith Vijayan
Created 02-28-2020 07:40 AM
I'm seeing this same problem as well haven't found a solution. It appears that the warehouse connector code doesn't pass a tgt, or doesn't trigger the code to get a delegation token when running in a jupyter notebook through livy2 to yarn cluster. I can access the spark catalog in hive when I use the typical spark.sql approach through livy2 to the yarn cluster, but can't use warehouse connector. The warehouse connector does work when I use spark-submit or pyspark.
Below you can see the difference in the requests that are received by hs2, the first is when using the hive warehouse connector through livy the second is when using hive warehouse connector through spark-submit
Here is the simple code I'm trying to run
from pyspark_llap.sql.session import HiveWarehouseSession
hive = HiveWarehouseSession.session(spark).build()
hive.showDatabases().show(100)
The cookie hive.server2.auth is NOT present in the request to hs2-interactive from jupyter to livy2 to yarn cluster
POST //xx.xx.com:10501/cliservice HTTP/1.1 Content-Type: application/x-thrift Accept: application/x-thrift User-Agent: Java/THttpClient/HC Authorization: Basic YW5vbnltb3VzOnBhc3N3b3Jk Cookie: Content-Length: 144 Host: xx.xx.com:10501 Connection: keep-alive Accept-Encoding: gzip,deflate X-XSRF-HEADER: true
This error is displayed in the jupyter notebook
py4j.protocol.Py4JJavaError: An error occurred while calling o105.showDatabases. : java.lang.RuntimeException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Could not open client transport for any of the Server URI's in ZooKeeper: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.executeInternal(HiveWarehouseSessionImpl.java:200) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.executeSmart(HiveWarehouseSessionImpl.java:189) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.execute(HiveWarehouseSessionImpl.java:182) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.showDatabases(HiveWarehouseSessionImpl.java:257) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:745) Caused by: java.sql.SQLException: Cannot create PoolableConnectionFactory (Could not open client transport for any of the Server URI's in ZooKeeper: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401) at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2385) at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:2110) at org.apache.commons.dbcp2.BasicDataSource.getLogWriter(BasicDataSource.java:1622) at org.apache.commons.dbcp2.BasicDataSourceFactory.createDataSource(BasicDataSourceFactory.java:554) at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.getConnector(HS2JDBCWrapper.scala:433) at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.getConnector(HS2JDBCWrapper.scala:440) at com.hortonworks.spark.sql.hive.llap.DefaultJDBCWrapper.getConnector(HS2JDBCWrapper.scala) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.lambda$new$0(HiveWarehouseSessionImpl.java:86) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl.executeInternal(HiveWarehouseSessionImpl.java:196) ... 14 more Caused by: java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401 at shadehive.org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:344) at shadehive.org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107) at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:53) at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:291) at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:2395) at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2381) ... 22 more Caused by: java.sql.SQLException: Could not establish connection to jdbc:hive2://xx.xx.com:10501/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;auth=delegationToken: HTTP Response code: 401 at shadehive.org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:872) at shadehive.org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:316) ... 27 more Caused by: org.apache.thrift.transport.TTransportException: HTTP Response code: 401 at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:262) at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:316) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) at shadehive.org.apache.hive.service.rpc.thrift.TCLIService$Client.send_OpenSession(TCLIService.java:170) at shadehive.org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:162) at shadehive.org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:853) ... 28 more
This is the error that I see in the hiveserver2Interactive.log
2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: servlet.ServletHandler (ServletHandler.java:doScope(499)) - servlet |/cliservice|null -> org.apache.hive.service.cli.thrift.ThriftHttpServlet-4b8a0d03@3497805d==org.apache.hive.service.cli.thrift.ThriftHttpServlet,jsp=null,order=-1,inst=true 2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: servlet.ServletHandler (ServletHandler.java:doHandle(562)) - chain=null 2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:validateCookie(308)) - No valid cookies associated with the request Request(POST //xx.xx.xx.com:10501/cliservice)@612b2fe8 2020-02-28T09:29:56,914 INFO [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(147)) - Could not validate cookie sent, will try to generate a new cookie 2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:logPrivilegedAction(1757)) - PrivilegedAction as:HTTP/xx.xx.com@DEV.COM (auth:KERBEROS) from:org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:402) 2020-02-28T09:29:56,914 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:doAs(1734)) - PrivilegedActionException as:HTTP/xx.xx.com@DEV.COM (auth:KERBEROS) cause:org.apache.hive.service.auth.HttpAuthenticationException: Kerberos authentication failed: 2020-02-28T09:29:56,915 INFO [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doKerberosAuth(404)) - Failed to authenticate with http/_HOST kerberos principal, trying with hive/_HOST kerberos principal 2020-02-28T09:29:56,915 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:logPrivilegedAction(1757)) - PrivilegedAction as:hive/xx.xx.com@DEV.COM (auth:KERBEROS) from:org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410) 2020-02-28T09:29:56,915 DEBUG [HiveServer2-HttpHandler-Pool: Thread-5166]: security.UserGroupInformation (UserGroupInformation.java:doAs(1734)) - PrivilegedActionException as:hive/xx.xx.com@DEV.COM (auth:KERBEROS) cause:org.apache.hive.service.auth.HttpAuthenticationException: Kerberos authentication failed: 2020-02-28T09:29:56,915 ERROR [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doKerberosAuth(415)) - Failed to authenticate with hive/_HOST kerberos principal 2020-02-28T09:29:56,915 ERROR [HiveServer2-HttpHandler-Pool: Thread-5166]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(214)) - Error: org.apache.hive.service.auth.HttpAuthenticationException: java.lang.reflect.UndeclaredThrowableException at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:416) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:161) [hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) [javax.servlet-api-3.1.0.jar:3.1.0] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0] at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.Server.handle(Server.java:539) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) [jetty-runner-9.3.25.v20180904.jar:9.3.25.v20180904] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?] at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] ... 25 more Caused by: org.apache.hive.service.auth.HttpAuthenticationException: Kerberos authentication failed: at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:472) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:421) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?] at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] ... 25 more Caused by: org.ietf.jgss.GSSException: Defective token detected (Mechanism level: GSSHeader did not find the right tag) at sun.security.jgss.GSSHeader.<init>(GSSHeader.java:97) ~[?:1.8.0_112] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:306) ~[?:1.8.0_112] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) ~[?:1.8.0_112] at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:460) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:421) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?]
at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:410) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] ... 25 more
The cookie hive.server2.auth is present in the request to hs2-interactive from pyspark to yarn cluster
POST //xx.xx.com:10501/cliservice HTTP/1.1 Content-Type: application/x-thrift Accept: application/x-thrift User-Agent: Java/THttpClient/HC Authorization: Basic YW5vbnltb3VzOnBhc3N3b3Jk Cookie: Content-Length: 144 Host: xx.xx.com:10501 Connection: keep-alive
Cookie: hive.server2.auth=cu=testuser&rn=6456929925324967245&s=keyCVNSqTtfEzHwqP9kLvgpGzWlFIp+G1t2LfNHBy+s= Accept-Encoding: gzip,deflate X-XSRF-HEADER: true