We have a project where currently Shell script, Hive, Execution engine: TEZ is being used. For POC purpose we tried replacing shell scripts with spark and we executed HQLs through spark . One of the client cam back with a question that why would we need spark application as we can set spark as an execution engine and we can run our regular shell scripts and oozie workflow. What is the better option to choose just choose
- set hive.execution.engine=spark; OR make spark application and execute HQLs with spark APIs. If performance is same for both of them then why do we need to write code in Spark? What is the advantage of writing spark application using SPARK SQL?