Created 04-12-2020 03:24 AM
I tried to run a select query on a hive table through spark shell. this is my code :
scala >import org.apache.spark.sql.hive.HiveContext
scala >val sqlContext = new HiveContext(sc)
scala >val df = sqlContext.sql("select count(*) from bdp.serie")
scala >df.head
but I got an error when I execute any read command (df.head, df.count, df.show) . this is the error :
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#13L])
+- HiveTableScan HiveTableRelation `bdp`.`serie`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [data#0, temperature#1, hum#2]
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
... 49 elided
Caused by: java.io.IOException: Not a file: hdfs://sandbox-
hdp.hortonworks.com:8020/warehouse/tablespace/managed/hive/bdp.db/serie/delta_0000001_0000001_0000
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:337)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
... 73 more
ps : when I execute the show tables query I get the result without error.
show create table serie :
and hdfs dfs -ls ../../warehouse/tablespace/managed/hive/bdp.db/serie :
Created 04-12-2020 06:36 AM
Hey @hicha , what is the version of Spark you are using?
What do you receive as the output, when using the `Spark Session`, instead of `Hive Context`?
Created 04-12-2020 06:50 AM
Hi , i use last version of HDP 3.0
Created 05-08-2020 10:55 PM
Okay, let me know if changing HiveContext to SparkContext makes any difference. It could give a lead to resolution.