Member since
03-14-2019
3
Posts
0
Kudos Received
0
Solutions
04-19-2019
01:56 AM
@Barath Natarajan Check out how many executors and memory that spark-sql cli has been initialized(it seems to be running on local mode with one executor). To debug the query run an explain plan on the query. Check out how many files in hdfs directory for each table, if too many files then consolidate them to smaller number. Another approach would be: -> Run spark-shell (or) pyspark with local mode/yarn-client mode with more number of executors/more memory -> Then load the tables into dataframe and then registerTempTable(spark1.X)/createOrReplaceTempView(if using spark2) -> Run your join using spark.sql("<join query>") -> Check out the performance of the query.
... View more