Created on 07-28-2017 02:51 PM - edited 09-16-2022 05:00 AM
I have installed Spark and configure Hive to use it as execution engine.
Select * from table name works fine.
But select count(*) from table name fails with following error:
drwxr-xr-x - admin admin 0 2017-07-28 16:36 /user/admin
drwx------ - ec2-user supergroup 0 2017-07-28 17:50 /user/ec2-user
drwxr-xr-x - hdfs hdfs 0 2017-07-28 11:37 /user/hdfs
drwxrwxrwx - mapred hadoop 0 2017-07-16 06:03 /user/history
drwxrwxr-t - hive hive 0 2017-07-16 06:04 /user/hive
drwxrwxr-x - hue hue 0 2017-07-28 10:16 /user/hue
drwxrwxr-x - impala impala 0 2017-07-16 07:13 /user/impala
drwxrwxr-x - oozie oozie 0 2017-07-16 06:05 /user/oozie
drwxr-x--x - spark spark 0 2017-07-28 17:17 /user/spark
drwxrwxr-x - sqoop2 sqoop 0 2017-07-16 06:37 /user/sqoop2
the /user directory has owner as ec2-user and group as supergroup.
I tried running the query from CLI:
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> select count(*) from kaggle.test_house;
Query ID = ec2-user_20170728174949_aa9d7be9-038c-44a0-a42b-1b210a37f4ec
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask