Support Questions

mayank1984 · ‎07-28-2017

I have installed Spark and configure Hive to use it as execution engine.

Select * from table name works fine.

But select count(*) from table name fails with following error:

Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

At times also got an error stating "failed to create spark client"

I have also tried to modify the memort parameters but to no avail. Can you please tell me what should be the ideal memory setting?

Below is the directory structure from hdfs

drwxr-xr-x - admin admin 0 2017-07-28 16:36 /user/admin

drwx------ - ec2-user supergroup 0 2017-07-28 17:50 /user/ec2-user

drwxr-xr-x - hdfs hdfs 0 2017-07-28 11:37 /user/hdfs

drwxrwxrwx - mapred hadoop 0 2017-07-16 06:03 /user/history

drwxrwxr-t - hive hive 0 2017-07-16 06:04 /user/hive

drwxrwxr-x - hue hue 0 2017-07-28 10:16 /user/hue

drwxrwxr-x - impala impala 0 2017-07-16 07:13 /user/impala

drwxrwxr-x - oozie oozie 0 2017-07-16 06:05 /user/oozie

drwxr-x--x - spark spark 0 2017-07-28 17:17 /user/spark

drwxrwxr-x - sqoop2 sqoop 0 2017-07-16 06:37 /user/sqoop2

the /user directory has owner as ec2-user and group as supergroup.

I tried running the query from CLI:

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.

hive> select count(*) from kaggle.test_house;

Query ID = ec2-user_20170728174949_aa9d7be9-038c-44a0-a42b-1b210a37f4ec

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

mbigelow · ‎07-28-2017

The reason the first query works is because it does not need any MR or Spark jobs to run. The HS2 or Hive client just read the data directly. The second query requires MR or Spark jobs to be ran. This is key to remember when testing or troubleshooting the cluster.

Are you able to run Spark jobs out side of Hive?

Try the below command but swap out to your jar version.

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 /opt/cloudera/parcels/SPARK/lib/spark/examples/jars/spark-examples_*.jar

Also access the Spark History server to get to the driver and executor logs to try to get more details on the failure.

mayank1984 · ‎07-29-2017

Thank you for the reply.

I did not have the spark folder in the location. I had SPARK2. After I run the command. I get the below error.

[ec2-user@ip-172-31-37-124 jars]$ spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 /opt/cloudera/parcels/SPARK2/lib/spark2/examples/jars/spark-examples_2.11-2.2.0.cloudera1.jar

WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark) overrides detected (/usr/lib/spark).

WARNING: Running spark-class from user-defined location.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$

at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28)

at org.apache.spark.examples.SparkPi.main(SparkPi.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 11 more

mayank1984 · ‎07-29-2017

The spark request is now getting submitted but now i am getting following error:

hive> select count(*) from kaggle.test_house;

Query ID = ec2-user_20170729070303_887365d6-ce92-4ec3-bc8a-2adf3cfec117

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Spark Job = 614015ef-31f9-4e14-9b71-c161f64916db

Job hasn't been submitted after 61s. Aborting it.

Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.

Please check YARN or Spark driver's logs for further information.

Status: SENT

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Jalaj · ‎09-12-2018

Hi ,

I am also getting the same below error running Hive on Spark using IBM data stage

main_program: Fatal Error: The connector received an error from the driver. The reported error is: [SQLSTATE HY000] java.sql.SQLException: [IBM][Hive JDBC Driver][Hive]Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.

Were you able to resolve the issue.

Thanks,

Jalaj

Cloudera Community

Support Questions

Hive on Spark Queries are not working