Support Questions
Find answers, ask questions, and share your expertise

Which option is better to use, spark as an execution engine or spark application with spark SQL


We have a project where currently Shell script, Hive, Execution engine: TEZ is being used. For POC purpose we tried replacing shell scripts with spark and we executed HQLs through spark . One of the client cam back with a question that why would we need spark application as we can set spark as an execution engine and we can run our regular shell scripts and oozie workflow. What is the better option to choose just choose

  1. set hive.execution.engine=spark; OR make spark application and execute HQLs with spark APIs. If performance is same for both of them then why do we need to write code in Spark? What is the advantage of writing spark application using SPARK SQL?


Hi @HDave

When SparkSQL uses hive

SparkSQL can use HiveMetastore to get the metadata of the data stored in HDFS. This metadata enables SparkSQL to do better optimization of the queries that it executes. Here Spark is the query processor.

When Hive uses Spark See the JIRA entry: HIVE-7292

Here the the data is accessed via spark. And Hive is the Query processor. So we have all the deign features of Spark Core to take advantage of. But this is a Major Improvement for Hive but there is certain dependency of version between spark and hive , Link:

Here is already the link on HCC you can view:


Hi @HDave

Hope you doing good, did you get the answer you are looking for?

if yes, Can you please provide the feedback and marked thread as close.


Vikas Srivastava

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.