Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

converting hive sql to spark sql


I am ingesting data that is put into hdfs and I would like to convert the hive sql script to spark sql to improve the speed. Looking for docs or a general solution to a problem of this sort. Any feedback is greatly appreciated. The spark code would be written in scala.





To my knowledge, there are two wasy to interact spark with hive. This is the very high level information to interact hive with spark 


# Login to hive and try the below steps

Run the query in hive itself with spark engine


# To check the current execution engine
hive> set hive.execution.engine;
hive.execution.engine=mr     # by default it is mr

# To setup the current execution engine to spark. Note: This is session specific
hive> set hive.execution.engine=spark;


# To check the execution engine after setup
hive> set hive.execution.engine;


run your quries now


# Login to spark and try the below steps
scala> sqlContext.sql("select * from tbl1").collect().foreach(println)  ## An example




thanks for the response really good and detailed could you give a little bit of a lower level response as well say how would I add data from a dataframe in spark to a table in hive effeciently. The goal is to improve the speed by using spark instead of hive or impala for db insertions thanks.