Support Questions

Find answers, ask questions, and share your expertise

Who agreed with this topic

Hive on Spark Queries are not working

avatar
Contributor

I have installed Spark and configure Hive to use it as execution engine. 

 

Select * from table name works fine. 

 

But select count(*) from table name fails with following error: 

 

  • Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
 
At times also got an error stating "failed to create spark client"
 
I have also tried to modify the memort parameters but to no avail. Can you please tell me what should be the ideal memory setting? 
 
Below is the directory structure from hdfs
 

drwxr-xr-x   - admin    admin               0 2017-07-28 16:36 /user/admin

drwx------   - ec2-user supergroup          0 2017-07-28 17:50 /user/ec2-user

drwxr-xr-x   - hdfs     hdfs                0 2017-07-28 11:37 /user/hdfs

drwxrwxrwx   - mapred   hadoop              0 2017-07-16 06:03 /user/history

drwxrwxr-t   - hive     hive                0 2017-07-16 06:04 /user/hive

drwxrwxr-x   - hue      hue                 0 2017-07-28 10:16 /user/hue

drwxrwxr-x   - impala   impala              0 2017-07-16 07:13 /user/impala

drwxrwxr-x   - oozie    oozie               0 2017-07-16 06:05 /user/oozie

drwxr-x--x   - spark    spark               0 2017-07-28 17:17 /user/spark

drwxrwxr-x   - sqoop2   sqoop               0 2017-07-16 06:37 /user/sqoop2

 

the /user directory has owner as ec2-user and group as supergroup.

 

I tried running the query from CLI: 

 

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.

hive> select count(*) from kaggle.test_house;

Query ID = ec2-user_20170728174949_aa9d7be9-038c-44a0-a42b-1b210a37f4ec

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Who agreed with this topic