Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie shell action - Spark2-submit pyspark job having hive queries.

Solved Go to solution

Oozie shell action - Spark2-submit pyspark job having hive queries.

New Contributor

Environment: Cloudera Enterprise 5.10.2. 

 

We have a python spark job that is submitted using spark2-submit from the edge node and is working as expected. The spark job queries the hive tables, and it works in both client and cluster mode.

 

However, when it's submitted via a shell action from the oozie workflow, it fails to connect to the metastore. Actually, it opens a connection and then looses it. Any lead on how to fix would be helpful.

 

=======================================================================================

 

18/10/22 10:52:23 INFO hive.metastore: Trying to connect to metastore with URI thrift://correcturl.country.cloud.company.com:9083
18/10/22 10:52:23 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/10/22 10:52:23 WARN hive.metastore: set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:380)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:230)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:3741)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:3727)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:437)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:235)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1490)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:67)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:82)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2935)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2954)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3179)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:210)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:197)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:307)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:268)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:243)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply$mcV$sp(HiveCredentialProvider.scala:91)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anon$1.run(HiveCredentialProvider.scala:123)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.doAsRealUser(HiveCredentialProvider.scala:122)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.obtainCredentials(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:80)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:78)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager.obtainCredentials(ConfigurableCredentialManager.scala:78)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:398)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:874)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:171)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
18/10/22 10:52:23 INFO hive.metastore: Connected to metastore.
18/10/22 10:52:23 WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3339)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3327)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllFunctions(HiveMetaStoreClient.java:2060)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:105)
        at com.sun.proxy.$Proxy18.getAllFunctions(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1998)
        at com.sun.proxy.$Proxy18.getAllFunctions(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3179)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:210)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:197)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:307)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:268)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:243)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply$mcV$sp(HiveCredentialProvider.scala:91)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anon$1.run(HiveCredentialProvider.scala:123)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.doAsRealUser(HiveCredentialProvider.scala:122)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.obtainCredentials(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:80)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:78)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager.obtainCredentials(ConfigurableCredentialManager.scala:78)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:398)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:874)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:171)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
18/10/22 10:52:24 INFO hive.metastore: Closed a connection to metastore, current connections: 0
18/10/22 10:52:24 INFO hive.metastore: Trying to connect to metastore with URI thrift://correcturl.country.cloud.company.com:9083
18/10/22 10:52:24 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/10/22 10:52:24 WARN hive.metastore: set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Oozie shell action - Spark2-submit pyspark job having hive queries.

New Contributor

 

 

The issue is that Spark couldn't read hive-site.xml and hence providing the properties in SparkSession builder fixed the issue.

 


spark_session = (SparkSession .builder .config("hive.metastore.uris", "thrift://server.country.cloud.company.com:9083") .config("hive.metastore.client.socket.timeout", "300") .config("hive.metastore.warehouse.dir", "/user/hive/warehouse") .config("hive.warehouse.subdir.inherit.perms", "true") .config("hive.execution.engine", "mr") .config("hive.metastore.execute.setugi", "true") .config("hive.support.concurrency", "true") .config("hive.zookeeper.quorum", "<<REDACTED>>") .config("hive.zookeeper.client.port", "2181") .config("hive.zookeeper.namespace", "hive_zookeeper_namespace_hive") .config("hive.cluster.delegation.token.store.class", "org.apache.hadoop.hive.thrift.MemoryTokenStore") .config("hive.server2.enable.doAs", "false") .config("hive.metastore.sasl.enabled", "true") .config("hive.server2.authentication", "kerberos") .config("hive.metastore.kerberos.principal", "<<REDACTED>>") .config("hive.server2.authentication.kerberos.principal", "<<REDACTED>>") .config("hive.server2.use.SSL", "true") .config("hive.exec.dynamic.partition", "true") .config("hive.exec.dynamic.partition.mode", "nonstrict") .enableHiveSupport() .appName('app_name') .getOrCreate())

 

1 REPLY 1
Highlighted

Re: Oozie shell action - Spark2-submit pyspark job having hive queries.

New Contributor

 

 

The issue is that Spark couldn't read hive-site.xml and hence providing the properties in SparkSession builder fixed the issue.

 


spark_session = (SparkSession .builder .config("hive.metastore.uris", "thrift://server.country.cloud.company.com:9083") .config("hive.metastore.client.socket.timeout", "300") .config("hive.metastore.warehouse.dir", "/user/hive/warehouse") .config("hive.warehouse.subdir.inherit.perms", "true") .config("hive.execution.engine", "mr") .config("hive.metastore.execute.setugi", "true") .config("hive.support.concurrency", "true") .config("hive.zookeeper.quorum", "<<REDACTED>>") .config("hive.zookeeper.client.port", "2181") .config("hive.zookeeper.namespace", "hive_zookeeper_namespace_hive") .config("hive.cluster.delegation.token.store.class", "org.apache.hadoop.hive.thrift.MemoryTokenStore") .config("hive.server2.enable.doAs", "false") .config("hive.metastore.sasl.enabled", "true") .config("hive.server2.authentication", "kerberos") .config("hive.metastore.kerberos.principal", "<<REDACTED>>") .config("hive.server2.authentication.kerberos.principal", "<<REDACTED>>") .config("hive.server2.use.SSL", "true") .config("hive.exec.dynamic.partition", "true") .config("hive.exec.dynamic.partition.mode", "nonstrict") .enableHiveSupport() .appName('app_name') .getOrCreate())

 

Don't have an account?
Coming from Hortonworks? Activate your account here