Created on 04-16-2016 09:55 AM - edited 09-16-2022 03:14 AM
I have enabled Spark as the default execution engine on Hive on CDH 5.7 but get the following when I execute a query against Hive from my edge node. Is there anything I need to enable on my client edge node. I can run the spark-shell and have exported SPARK_HOME. Also copied Client Config to edge node. Is there anything else I need to enable/configure?
ERROR : Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:64)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:125)
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1774)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1531)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1311)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1113)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:178)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:72)
at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:232)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:245)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client '478049ac-228c-4abb-8ef3-93157822a0a1'. Error: Child process exited before connecting back
at com.google.common.base.Throwables.propagate(Throwables.java:156)
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:111)
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
... 22 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client '478049ac-228c-4abb-8ef3-93157822a0a1'. Error: Child process exited before connecting back
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
... 27 more
Caused by: java.lang.RuntimeException: Cancel client '478049ac-228c-4abb-8ef3-93157822a0a1'. Error: Child process exited before connecting back
at org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
at org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
... 1 more
Created 04-17-2016 12:15 PM
The YARN Container Memory was smaller than the Spark Executor requirement. I set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
Created 04-17-2016 12:15 PM
The YARN Container Memory was smaller than the Spark Executor requirement. I set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
Created 05-11-2016 10:18 PM
Created 05-12-2016 01:51 AM
The YARN logs contained errors that complained about the memory deficiencies when I selected the Spark Engine for Hive. And I noticed the Executor Memory Size + Overhead for Spark defaults was larger than the YARN container memory settings. Increasing the YARN Container Memory configuration cured the problem or alternatively you could lower the Spark Executor requirements.
Created 05-14-2016 05:38 PM
Still not working for me... I have played with multiple parameters, but no scussess. Also, yarn logs do not show anything bad about memory. Any ideas?
Created 05-17-2016 02:42 AM
When you say it is not working, what issue does it exhibit? For Hive on Spark you only need set the Execution Engine within Hive from MapReduce to Spark. You do need to consider Spark memory setting for Executors in the Spark Service and these must correlate to the YARN container memory settings. Generally I set the following YARN container settings:
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-mb
To be the same value but greater than the Spark Executor Memory + Overhead . Check also for the following similar error in the YARN logs:
15/09/17 11:15:09 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2211 MB per container)
Exception in thread "main" java.lang.IllegalArgumentException: Required executor memory (2048+384 MB) is above the max threshold (2211 MB) of this cluster!
Regards
Shailesh