Member since
10-06-2017
11
Posts
0
Kudos Received
0
Solutions
01-18-2020
02:50 AM
Spark Catalyst Optimiser is smart.If it not optimising well then you have to think about it else it is able to optimise. Below is one example: fr = spark.createDataframe([{'a':1},{'b':2}]) fr.select('a','b').drop('a') parsed logical plan for above query is below Parsed Logical Plan == Project [b#69L] +- Project [a#68L, b#69L] +- LogicalRDD [a#68L, b#69L], false And Physical plan is below Physical Plan == *(1) Project [b#69L] +- *(1) Scan ExistingRDD[a#68L,b#69L] Spark is optimising the query from two projection to single projection Which is same as Physical plan of fr.select('a').
... View more
01-05-2020
04:03 AM
Hi, Whether those links that was provided helped to solve the issue http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ if it was fine can you please mark this forum as solved. Thanks AKR
... View more
11-07-2017
11:29 AM
I didn't .. I will go through it..
... View more
11-02-2017
02:04 PM
1 Kudo
@pranayreddy bommineni You can add limit 1 to your SQL query with ExecuteSQL to get only one row for schema inference
... View more