Hi @Abhijeet Rajput,
In response to handling the
huge SQL, Spark does lazy evolution
which means you can split
your code into multiple blocks and write using the multiple data frames.
That will be evaluated at
last and uses the optimal execution plan that can accommodate for the operation.
Example :
var subquery1 = sql (“select
c1,c2,c3 form tbl1 join tbl2 on codition1 and condition 2”)
subquery1.registerTempTable(“res1”)
var subquery2 = sql (“select
c1,c2,c3 form res1 join tbl3 on codition4 and condition 5”)and so on….
On the other request, there
is no difference between using the DataFrame base API or SQL as the same execution
plan will be generated for both, you can validate the same from DAG schedule while on execution with Spark UI.