Created on 02-14-2017 08:13 AM - edited 09-16-2022 04:05 AM
I am ingesting data that is put into hdfs and I would like to convert the hive sql script to spark sql to improve the speed. Looking for docs or a general solution to a problem of this sort. Any feedback is greatly appreciated. The spark code would be written in scala.
Created 02-14-2017 11:26 AM
To my knowledge, there are two wasy to interact spark with hive. This is the very high level information to interact hive with spark
# Login to hive and try the below steps
Run the query in hive itself with spark engine
# To check the current execution engine
hive> set hive.execution.engine;
hive.execution.engine=mr # by default it is mr
# To setup the current execution engine to spark. Note: This is session specific
hive> set hive.execution.engine=spark;
# To check the execution engine after setup
hive> set hive.execution.engine;
hive.execution.engine=spark
run your quries now
# Login to spark and try the below steps
>Spark-shell
scala> sqlContext.sql("select * from tbl1").collect().foreach(println) ## An example
Created 02-15-2017 06:46 AM