Reply
New Contributor
Posts: 4
Registered: ‎10-01-2017
Accepted Solution

Oozie shell action - Spark2-submit pyspark job having hive queries.

[ Edited ]

Environment: Cloudera Enterprise 5.10.2. 

 

We have a python spark job that is submitted using spark2-submit from the edge node and is working as expected. The spark job queries the hive tables, and it works in both client and cluster mode.

 

However, when it's submitted via a shell action from the oozie workflow, it fails to connect to the metastore. Actually, it opens a connection and then looses it. Any lead on how to fix would be helpful.

 

=======================================================================================

 

18/10/22 10:52:23 INFO hive.metastore: Trying to connect to metastore with URI thrift://correcturl.country.cloud.company.com:9083
18/10/22 10:52:23 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/10/22 10:52:23 WARN hive.metastore: set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:380)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:230)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:3741)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:3727)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:437)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:235)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1490)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:67)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:82)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2935)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2954)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3179)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:210)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:197)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:307)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:268)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:243)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply$mcV$sp(HiveCredentialProvider.scala:91)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anon$1.run(HiveCredentialProvider.scala:123)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.doAsRealUser(HiveCredentialProvider.scala:122)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.obtainCredentials(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:80)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:78)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager.obtainCredentials(ConfigurableCredentialManager.scala:78)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:398)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:874)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:171)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
18/10/22 10:52:23 INFO hive.metastore: Connected to metastore.
18/10/22 10:52:23 WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3339)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3327)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllFunctions(HiveMetaStoreClient.java:2060)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:105)
        at com.sun.proxy.$Proxy18.getAllFunctions(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1998)
        at com.sun.proxy.$Proxy18.getAllFunctions(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3179)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:210)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:197)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:307)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:268)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:243)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply$mcV$sp(HiveCredentialProvider.scala:91)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anonfun$obtainCredentials$1.apply(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider$$anon$1.run(HiveCredentialProvider.scala:123)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.doAsRealUser(HiveCredentialProvider.scala:122)
        at org.apache.spark.deploy.yarn.security.HiveCredentialProvider.obtainCredentials(HiveCredentialProvider.scala:90)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:80)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager$$anonfun$obtainCredentials$2.apply(ConfigurableCredentialManager.scala:78)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.deploy.yarn.security.ConfigurableCredentialManager.obtainCredentials(ConfigurableCredentialManager.scala:78)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:398)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:874)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:171)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
18/10/22 10:52:24 INFO hive.metastore: Closed a connection to metastore, current connections: 0
18/10/22 10:52:24 INFO hive.metastore: Trying to connect to metastore with URI thrift://correcturl.country.cloud.company.com:9083
18/10/22 10:52:24 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/10/22 10:52:24 WARN hive.metastore: set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.

 

New Contributor
Posts: 4
Registered: ‎10-01-2017

Re: Oozie shell action - Spark2-submit pyspark job having hive queries.

[ Edited ]

 

 

The issue is that Spark couldn't read hive-site.xml and hence providing the properties in SparkSession builder fixed the issue.

 


spark_session = (SparkSession .builder .config("hive.metastore.uris", "thrift://server.country.cloud.company.com:9083") .config("hive.metastore.client.socket.timeout", "300") .config("hive.metastore.warehouse.dir", "/user/hive/warehouse") .config("hive.warehouse.subdir.inherit.perms", "true") .config("hive.execution.engine", "mr") .config("hive.metastore.execute.setugi", "true") .config("hive.support.concurrency", "true") .config("hive.zookeeper.quorum", "<<REDACTED>>") .config("hive.zookeeper.client.port", "2181") .config("hive.zookeeper.namespace", "hive_zookeeper_namespace_hive") .config("hive.cluster.delegation.token.store.class", "org.apache.hadoop.hive.thrift.MemoryTokenStore") .config("hive.server2.enable.doAs", "false") .config("hive.metastore.sasl.enabled", "true") .config("hive.server2.authentication", "kerberos") .config("hive.metastore.kerberos.principal", "<<REDACTED>>") .config("hive.server2.authentication.kerberos.principal", "<<REDACTED>>") .config("hive.server2.use.SSL", "true") .config("hive.exec.dynamic.partition", "true") .config("hive.exec.dynamic.partition.mode", "nonstrict") .enableHiveSupport() .appName('app_name') .getOrCreate())

 

Announcements