Hive on Spark Issue

Cluster Config:


CDH 5.5

Hive 1.1

spark 1.5


I'm following  follow the guideline from for setting up hive to execute on Spark engine.

Simple select statements, group by works fine. When i run a multi join hive statement on MR engine it completes in a minute and the same on Spark engine runs for hours and fails with "ExecutorLostFailure (executor 2 lost)" .

any help is much appreciated.


Super Collaborator

Hive on Spark is not officially supported and what you see is a one of those cases. Certain queries are slower, take more memory or fail. That is why it is not supported yet. We are working hard to fix and tune these use cases. Until that is done the only workaround is to fall back on the MR execution engine.



Am not able to run Hive select statments from spark.


I can able to run show databases, tables using hive sql context.


Syntax i tried : sqlContext.sql("FROM table SELECT state ")



DataTypeException: Unsupported dataType: char(1). If you have a struct and a field name of it has any special characters , please use backticks (`) to quote that field name, e.g. `x+y`. Please note that backtick itself is not supported in a field name.



Can you please help?


How about using Spark SQL as that supports tables in Hive metastore?