Support Questions

Find answers, ask questions, and share your expertise

converting hive sql to spark sql

avatar
Contributor

I am ingesting data that is put into hdfs and I would like to convert the hive sql script to spark sql to improve the speed. Looking for docs or a general solution to a problem of this sort. Any feedback is greatly appreciated. The spark code would be written in scala.

2 REPLIES 2

avatar
Champion

@gimp077

 

To my knowledge, there are two wasy to interact spark with hive. This is the very high level information to interact hive with spark 

 

# Login to hive and try the below steps

Run the query in hive itself with spark engine

 

# To check the current execution engine
hive> set hive.execution.engine;
hive.execution.engine=mr     # by default it is mr


# To setup the current execution engine to spark. Note: This is session specific
hive> set hive.execution.engine=spark;

 

# To check the execution engine after setup
hive> set hive.execution.engine;
hive.execution.engine=spark

 

run your quries now

 

# Login to spark and try the below steps
>Spark-shell
scala> sqlContext.sql("select * from tbl1").collect().foreach(println)  ## An example

 

 

 

avatar
Contributor
thanks for the response really good and detailed could you give a little bit of a lower level response as well say how would I add data from a dataframe in spark to a table in hive effeciently. The goal is to improve the speed by using spark instead of hive or impala for db insertions thanks.