About pranay_bomminen

RahulGoyal · ‎01-18-2020

Spark Catalyst Optimiser is smart.If it not optimising well then you have to think about it else it is able to optimise. Below is one example: fr = spark.createDataframe([{'a':1},{'b':2}]) fr.select('a','b').drop('a') parsed logical plan for above query is below Parsed Logical Plan == Project [b#69L] +- Project [a#68L, b#69L] +- LogicalRDD [a#68L, b#69L], false And Physical plan is below Physical Plan == *(1) Project [b#69L] +- *(1) Scan ExistingRDD[a#68L,b#69L] Spark is optimising the query from two projection to single projection Which is same as Physical plan of fr.select('a').

AKR · ‎01-05-2020

Hi, Whether those links that was provided helped to solve the issue http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ if it was fine can you please mark this forum as solved. Thanks AKR

pranay_bomminen · ‎11-07-2017

I didn't .. I will go through it..

ahadjidj · ‎11-02-2017

@pranayreddy bommineni You can add limit 1 to your SQL query with ExecuteSQL to get only one row for schema inference

Online	Offline
Last Visited	‎02-04-2018 03:45 PM

Member Since	‎10-06-2017 10:37 AM
Last Visited	‎02-04-2018 03:45 PM
Posts	11

Cloudera Community

Re: Spark SQL Drop vs Select

Re: How to pick number of executors , cores for ea...

Re: Setting Processor Properties dynamically and ...

Re: Need to get Table schema from database using n...