Support Questions

gimp077 · ‎02-14-2017

I am ingesting data that is put into hdfs and I would like to convert the hive sql script to spark sql to improve the speed. Looking for docs or a general solution to a problem of this sort. Any feedback is greatly appreciated. The spark code would be written in scala.

saranvisa · ‎02-14-2017

@gimp077

To my knowledge, there are two wasy to interact spark with hive. This is the very high level information to interact hive with spark

# Login to hive and try the below steps

Run the query in hive itself with spark engine

# To check the current execution engine
hive> set hive.execution.engine;
hive.execution.engine=mr # by default it is mr

# To setup the current execution engine to spark. Note: This is session specific
hive> set hive.execution.engine=spark;

# To check the execution engine after setup
hive> set hive.execution.engine;
hive.execution.engine=spark

run your quries now

# Login to spark and try the below steps
>Spark-shell
scala> sqlContext.sql("select * from tbl1").collect().foreach(println) ## An example

gimp077 · ‎02-15-2017

thanks for the response really good and detailed could you give a little bit of a lower level response as well say how would I add data from a dataframe in spark to a table in hive effeciently. The goal is to improve the speed by using spark instead of hive or impala for db insertions thanks.

Cloudera Community

Support Questions

converting hive sql to spark sql