Member since
01-18-2020
1
Post
0
Kudos Received
0
Solutions
01-18-2020
02:50 AM
Spark Catalyst Optimiser is smart.If it not optimising well then you have to think about it else it is able to optimise. Below is one example: fr = spark.createDataframe([{'a':1},{'b':2}]) fr.select('a','b').drop('a') parsed logical plan for above query is below Parsed Logical Plan == Project [b#69L] +- Project [a#68L, b#69L] +- LogicalRDD [a#68L, b#69L], false And Physical plan is below Physical Plan == *(1) Project [b#69L] +- *(1) Scan ExistingRDD[a#68L,b#69L] Spark is optimising the query from two projection to single projection Which is same as Physical plan of fr.select('a').
... View more