Created 11-04-2022 05:09 AM
running the following from hive
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.support.concurrency=true;
INSERT INTO hello_acid partition (load_date='2016-03-01') VALUES (1, 1);
getting this error
SQL Error [30041] [42000]: Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session e73c125d-e40a-49fb-b2e0-28c28ab1ff60_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
Hive has Spark configured as dependency
Have this settings in hive-site.xml
<property><name>hive.spark.client.connect.timeout</name><value>30000ms</value></property><property><name>hive.spark.client.server.connect.timeout</name><value>300000ms</value></property>
Some Other settings
Java Configuration Options for HiveServer2
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:MaxPermSize=512M -XX:+UseParNewGC -XX:-UseGCOverheadLimit
I do not see any applications in the hadoop web UI.
The logs are below
From what I can tell hive does not recognize where spark libraries are. I believe this should have been addressed by setting hive dependency on Spark.
What am I missing?
Thanks
2022-11-04 02:46:21,069 ERROR org.apache.hive.spark.client.SparkClientImpl: [HiveServer2-Background-Pool: Thread-57]: Error while waiting for Remote Spark Driver to connect back to HiveServer2.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final]
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6
...
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 1 more
2022-11-04 02:46:21,088 ERROR org.apache.hadoop.hive.ql.exec.spark.SparkTask: [HiveServer2-Background-Pool: Thread-57]: Failed to execute Spark task "Stage-1"
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session e73c125d-e40a-49fb-b2e0-28c28ab1ff60_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:286) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:135) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
..
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.RuntimeException: Error while waiting for Remote Spark Driver to connect back to HiveServer2.
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:124) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 22 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final]
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 22 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final]
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 22 more
Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 1 more
Created 11-04-2022 10:18 AM
quick update.
ran spark shell and got better errors indicating wrong config settings
$ spark-shell --master yarn
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/11/04 16:59:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/04 16:59:49 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (1024 MB), offHeap memory (0) MB, overhead (384 MB), and PySpark memory (0 MB) is above the max threshold (1024 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:368)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:199)
Created 11-04-2022 10:18 AM
quick update.
ran spark shell and got better errors indicating wrong config settings
$ spark-shell --master yarn
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/11/04 16:59:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/04 16:59:49 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (1024 MB), offHeap memory (0) MB, overhead (384 MB), and PySpark memory (0 MB) is above the max threshold (1024 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:368)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:199)