Support Questions

gocham · ‎11-04-2022

running the following from hive

SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.support.concurrency=true;

INSERT INTO hello_acid partition (load_date='2016-03-01') VALUES (1, 1);

getting this error

SQL Error [30041] [42000]: Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session e73c125d-e40a-49fb-b2e0-28c28ab1ff60_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?

Hive has Spark configured as dependency

Have this settings in hive-site.xml

<property><name>hive.spark.client.connect.timeout</name><value>30000ms</value></property><property><name>hive.spark.client.server.connect.timeout</name><value>300000ms</value></property>

Some Other settings

Java Configuration Options for HiveServer2

-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:MaxPermSize=512M -XX:+UseParNewGC -XX:-UseGCOverheadLimit

I do not see any applications in the hadoop web UI.

The logs are below

From what I can tell hive does not recognize where spark libraries are. I believe this should have been addressed by setting hive dependency on Spark.

What am I missing?

Thanks

2022-11-04 02:46:21,069 ERROR org.apache.hive.spark.client.SparkClientImpl: [HiveServer2-Background-Pool: Thread-57]: Error while waiting for Remote Spark Driver to connect back to HiveServer2.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final]
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6

...

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 1 more
2022-11-04 02:46:21,088 ERROR org.apache.hadoop.hive.ql.exec.spark.SparkTask: [HiveServer2-Background-Pool: Thread-57]: Failed to execute Spark task "Stage-1"
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session e73c125d-e40a-49fb-b2e0-28c28ab1ff60_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:286) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:135) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]

..

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.RuntimeException: Error while waiting for Remote Spark Driver to connect back to HiveServer2.
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:124) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 22 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final]
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]

... 22 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final]
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 22 more
Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 1 more

gocham · ‎11-04-2022

quick update.

ran spark shell and got better errors indicating wrong config settings

$ spark-shell --master yarn
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/11/04 16:59:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/04 16:59:49 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (1024 MB), offHeap memory (0) MB, overhead (384 MB), and PySpark memory (0 MB) is above the max threshold (1024 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:368)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:199)

View solution in original post

gocham · ‎11-04-2022

quick update.

ran spark shell and got better errors indicating wrong config settings

$ spark-shell --master yarn
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/11/04 16:59:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/04 16:59:49 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (1024 MB), offHeap memory (0) MB, overhead (384 MB), and PySpark memory (0 MB) is above the max threshold (1024 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:368)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:199)

Cloudera Community

Support Questions

errors when running spark queries from hive on CDH6