Created 07-23-2019 01:12 PM
Getting "GSS initiate failed" error while submitting spark structured streaming pipeline on yarn cluster . I am reading from hdfs, applying filter and writing to Hive in Orc format in transaction table. I am using spark 2.3, hive 3 with hive warehouse connector.
9-07-23 15:29:58 task-result-getter-1 [WARN ] TaskSetManager - Lost task 0.1 in stage 0.0 (TID 1, mahendra-h0140, executor 1): java.lang.RuntimeException: Unable to instantiate shadehive.org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at shadehive.org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:86)
at shadehive.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:95)
at shadehive.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148)
at shadehive.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at shadehive.org.apache.hadoop.hive.metastore.HiveClientCache.getNonCachedHiveMetastoreClient(HiveClientCache.java:89)
at shadehive.org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getHiveMetastoreClient(HiveMetaStoreUtils.java:230)
at org.apache.hive.streaming.HiveStreamingConnection.getMetaStoreClient(HiveStreamingConnection.java:582)
at org.apache.hive.streaming.HiveStreamingConnection.<init>(HiveStreamingConnection.java:199)
at org.apache.hive.streaming.HiveStreamingConnection.<init>(HiveStreamingConnection.java:118)
at org.apache.hive.streaming.HiveStreamingConnection$Builder.connect(HiveStreamingConnection.java:333)
at com.hortonworks.spark.sql.hive.llap.HiveStreamingDataWriter.createStreamingConnection(HiveStreamingDataWriter.java:92)
at com.hortonworks.spark.sql.hive.llap.HiveStreamingDataWriter.<init>(HiveStreamingDataWriter.java:60)
at com.hortonworks.spark.sql.hive.llap.HiveStreamingDataWriterFactory.createDataWriter(HiveStreamingDataWriterFactory.java:41)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at shadehive.org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
... 21 more
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at shadehive.org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:51)
at shadehive.org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:48)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at shadehive.org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport.open(TUGIAssumingTransport.java:48)
at shadehive.org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:544)
at shadehive.org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:225)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at shadehive.org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
at shadehive.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:95)
at shadehive.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148)
at shadehive.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at shadehive.org.apache.hadoop.hive.metastore.HiveClientCache.getNonCachedHiveMetastoreClient(HiveClientCache.java:89)
at shadehive.org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getHiveMetastoreClient(HiveMetaStoreUtils.java:230)
at org.apache.hive.streaming.HiveStreamingConnection.getMetaStoreClient(HiveStreamingConnection.java:582)
at org.apache.hive.streaming.HiveStreamingConnection.<init>(HiveStreamingConnection.java:199)
at org.apache.hive.streaming.HiveStreamingConnection.<init>(HiveStreamingConnection.java:118)
at org.apache.hive.streaming.HiveStreamingConnection$Builder.connect(HiveStreamingConnection.java:333)
at com.hortonworks.spark.sql.hive.llap.HiveStreamingDataWriter.createStreamingConnection(HiveStreamingDataWriter.java:92)
at com.hortonworks.spark.sql.hive.llap.HiveStreamingDataWriter.<init>(HiveStreamingDataWriter.java:60)
at com.hortonworks.spark.sql.hive.llap.HiveStreamingDataWriterFactory.createDataWriter(HiveStreamingDataWriterFactory.java:41)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
)
at shadehive.org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:597)
at shadehive.org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:225)
Created 07-24-2019 05:55 AM
I can see the error
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed
The error above likely suggests your HMS is configured for security (kerberos) but that your login lacks a valid TGT (such as one obtained via kinit). Could you post the output of klist, and confirm if a 'hadoop fs' test
Also check the if you are logged in with correct user, which have all the access.
Created 08-06-2019 05:17 AM
I am successfully able to run klist and kinit command on this cluster, Although after adding the principal in Hive server2 interactive url, batch jobs are working fine, I am successfully able to read/write data with hive in batch mode, but while running a streaming job and trying to emit data in hive I am still getting the same error.