question Re: Spark Sql for ETL performance tuning in Support Questions

question Re: Spark Sql for ETL performance tuning in Support Questions https://community.cloudera.com/t5/Support-Questions/Spark-Sql-for-ETL-performance-tuning/m-p/238617#M200428 <A rel="user" href="https://community.cloudera.com/users/112382/barath51777.html" nodeid="112382">@Barath Natarajan</A>Check out how many executors and memory that spark-sql cli has been initialized(it seems to be running on local mode with one executor).<OL><LI>To debug the query run an explain plan on the query.</LI><LI>Check out how many files in hdfs directory for each table, if too many files then consolidate them to smaller number.</LI></OL>Another approach would be: -> Run spark-shell (or) pyspark with local mode/yarn-client mode with more number of executors/more memory -> Then load the tables into dataframe and then registerTempTable(spark1.X)/createOrReplaceTempView(if using spark2) -> Run your join using spark.sql("<join query>") -> Check out the performance of the query. Fri, 19 Apr 2019 08:56:41 GMT Shu_ashu 2019-04-19T08:56:41Z