About barath51777

Shu_ashu · ‎04-19-2019

@Barath Natarajan Check out how many executors and memory that spark-sql cli has been initialized(it seems to be running on local mode with one executor). To debug the query run an explain plan on the query. Check out how many files in hdfs directory for each table, if too many files then consolidate them to smaller number. Another approach would be: -> Run spark-shell (or) pyspark with local mode/yarn-client mode with more number of executors/more memory -> Then load the tables into dataframe and then registerTempTable(spark1.X)/createOrReplaceTempView(if using spark2) -> Run your join using spark.sql("<join query>") -> Check out the performance of the query.

Online	Offline
Last Visited	‎04-24-2019 09:15 AM

Member Since	‎03-14-2019 03:33 PM
Last Visited	‎04-24-2019 09:15 AM
Posts	3

Cloudera Community

Re: Spark Sql for ETL performance tuning