I have installed spark with CDH (1.5.0-cdh5.5.1). As per the document below, for beeline, I set hive.exectuion.engine=spark and run select table statement. It returns the rows expected.
Where to find the select statement really uses the spark engine or not? Or it still uses mr (default)? Is there any entry (log, yarn/spark history) to prove it?
Yes it should use Spark when you do that. There is nothing else that you need to do to run Hive on Spark. Keep in mind that it is not officially supported for production.
Spark normally runs on top of YARN and you should thus see a Spark application in the RM that was run. You can also check the Spark JHS for the Spark data.
Thank you Wilfred.
Even I checked "Enable Hive on Spark (Unsupported)" from Hive--> Configuration, beeline default hive.execution.engine is still mr. I have to set hive.exectuion.engine=spark manually.
Does that checkbox really set "hive.enable.spark.execution.engine" to True? or it works elsewhere?