Created on 07-28-2017 02:51 PM - edited 09-16-2022 05:00 AM
I have installed Spark and configure Hive to use it as execution engine.
Select * from table name works fine.
But select count(*) from table name fails with following error:
drwxr-xr-x - admin admin 0 2017-07-28 16:36 /user/admin
drwx------ - ec2-user supergroup 0 2017-07-28 17:50 /user/ec2-user
drwxr-xr-x - hdfs hdfs 0 2017-07-28 11:37 /user/hdfs
drwxrwxrwx - mapred hadoop 0 2017-07-16 06:03 /user/history
drwxrwxr-t - hive hive 0 2017-07-16 06:04 /user/hive
drwxrwxr-x - hue hue 0 2017-07-28 10:16 /user/hue
drwxrwxr-x - impala impala 0 2017-07-16 07:13 /user/impala
drwxrwxr-x - oozie oozie 0 2017-07-16 06:05 /user/oozie
drwxr-x--x - spark spark 0 2017-07-28 17:17 /user/spark
drwxrwxr-x - sqoop2 sqoop 0 2017-07-16 06:37 /user/sqoop2
the /user directory has owner as ec2-user and group as supergroup.
I tried running the query from CLI:
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> select count(*) from kaggle.test_house;
Query ID = ec2-user_20170728174949_aa9d7be9-038c-44a0-a42b-1b210a37f4ec
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Created 07-28-2017 07:12 PM
Created 07-29-2017 12:57 AM
Thank you for the reply.
I did not have the spark folder in the location. I had SPARK2. After I run the command. I get the below error.
[ec2-user@ip-172-31-37-124 jars]$ spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 /opt/cloudera/parcels/SPARK2/lib/spark2/examples/jars/spark-examples_2.11-2.2.0.cloudera1.jar
WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark) overrides detected (/usr/lib/spark).
WARNING: Running spark-class from user-defined location.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 11 more
Created 07-29-2017 04:13 AM
The spark request is now getting submitted but now i am getting following error:
hive> select count(*) from kaggle.test_house;
Query ID = ec2-user_20170729070303_887365d6-ce92-4ec3-bc8a-2adf3cfec117
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = 614015ef-31f9-4e14-9b71-c161f64916db
Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Created 09-12-2018 09:01 AM
Hi ,
I am also getting the same below error running Hive on Spark using IBM data stage
main_program: Fatal Error: The connector received an error from the driver. The reported error is: [SQLSTATE HY000] java.sql.SQLException: [IBM][Hive JDBC Driver][Hive]Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
Were you able to resolve the issue.
Thanks,
Jalaj