Support Questions

hicha · ‎04-12-2020

I tried to run a select query on a hive table through spark shell. this is my code :

scala >import org.apache.spark.sql.hive.HiveContext                                                                                                                     
scala >val sqlContext = new HiveContext(sc)    
scala >val df = sqlContext.sql("select count(*) from bdp.serie")
scala >df.head

but I got an error when I execute any read command (df.head, df.count, df.show) . this is the error :

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:                                                                                          
Exchange SinglePartition                                                                                                                                                
+- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#13L])                                                                                        
+- HiveTableScan HiveTableRelation `bdp`.`serie`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [data#0, temperature#1, hum#2]                         

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)                                                                                         
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)                                                               
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)                                                                                                                                                                                                                                                                                  
... 49 elided                                                                                                                                                         
Caused by: java.io.IOException: Not a file: hdfs://sandbox- 
hdp.hortonworks.com:8020/warehouse/tablespace/managed/hive/bdp.db/serie/delta_0000001_0000001_0000              
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:337)                                                                                       
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)                                                                                                  
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)                                                                                                
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)                                                                                                                                                                                                                                                                                     
... 73 more

ps : when I execute the show tables query I get the result without error.

show create table serie :

and hdfs dfs -ls ../../warehouse/tablespace/managed/hive/bdp.db/serie :

gsthina · ‎04-12-2020

Hey @hicha , what is the version of Spark you are using?

What do you receive as the output, when using the `Spark Session`, instead of `Hive Context`?

hicha · ‎04-12-2020

Hi , i use last version of HDP 3.0

gsthina · ‎05-08-2020

Okay, let me know if changing HiveContext to SparkContext makes any difference. It could give a lead to resolution.

Cloudera Community

Support Questions

Quering hive table from spark-shell