Can someone explain if the query optimization happens in the code level or at the database level. As I understand the code/API accessing the database has no knowledge of the database access patterns and how the data is stored so optimization by database (Hive) makes more sense instead of Spark/Hive(API). But can someone clarify what exactly happens when a query is sent for processing by the Hive driver/SparkSQL to the database. Where does query optimization happen? Also, I believe DAG creation/resolution is also dependent on the query optimization because based on the optimized query plan the DAG might be shorter or larger? Can someone also add how DAG is handled when query optimization happens? PS: I am a beginner with Spark/Hadoop/Hive so please correct me if I am understanding it all wrong.
... View more