Support Questions

mayank1984 · ‎07-28-2017

I have installed Spark and configure Hive to use it as execution engine.

Select * from table name works fine.

But select count(*) from table name fails with following error:

Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

At times also got an error stating "failed to create spark client"

I have also tried to modify the memort parameters but to no avail. Can you please tell me what should be the ideal memory setting?

Below is the directory structure from hdfs

drwxr-xr-x - admin admin 0 2017-07-28 16:36 /user/admin

drwx------ - ec2-user supergroup 0 2017-07-28 17:50 /user/ec2-user

drwxr-xr-x - hdfs hdfs 0 2017-07-28 11:37 /user/hdfs

drwxrwxrwx - mapred hadoop 0 2017-07-16 06:03 /user/history

drwxrwxr-t - hive hive 0 2017-07-16 06:04 /user/hive

drwxrwxr-x - hue hue 0 2017-07-28 10:16 /user/hue

drwxrwxr-x - impala impala 0 2017-07-16 07:13 /user/impala

drwxrwxr-x - oozie oozie 0 2017-07-16 06:05 /user/oozie

drwxr-x--x - spark spark 0 2017-07-28 17:17 /user/spark

drwxrwxr-x - sqoop2 sqoop 0 2017-07-16 06:37 /user/sqoop2

the /user directory has owner as ec2-user and group as supergroup.

I tried running the query from CLI:

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.

hive> select count(*) from kaggle.test_house;

Query ID = ec2-user_20170728174949_aa9d7be9-038c-44a0-a42b-1b210a37f4ec

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Who agreed with this topic