question Spark Sql for ETL performance tuning in Support Questions

question Spark Sql for ETL performance tuning in Support Questions https://community.cloudera.com/t5/Support-Questions/Spark-Sql-for-ETL-performance-tuning/m-p/238616#M200427 I am using spark sql cli for performing ETL operations on hive tables.There is a sql script query which involves more than 4 joins into different tables along with where conditions in each joins for filtering before inserting it to a new big table.The performance of this query is really poor even with really small amount of data in each tables.I have tried using various properties for improvising the performance of the query. SET spark.max.fetch.failure.per.stage = 10;SET spark.rpc.io.serverThreads = 64;SET spark.memory.fraction = 0.8;SET spark.memory.offHeap.enabled = true;SET spark.memory.offHeap.size = 3g;SET spark.shuffle.file.buffer = 1 MB;SET spark.unsafe.sorter.spill.reader.buffer.size = 1 MB;SET spark.file.transferTo = false;SET spark.shuffle.file.buffer = 1 MB;SET spark.shuffle.unsafe.file.output.buffer = 5 MB;SET spark.io.compression.lz4.blockSize=512KB;SET spark.shuffle.service.index.cache.entries = 2048;SET spark.shuffle.io.serverThreads = 128;SET spark.shuffle.io.backLog = 8192;SET spark.shuffle.registration.timeout = 2m;SET spark.shuffle.registration.maxAttempt = 5; But still the query runs for hours.These are the questions that pops out from my mind:<OL><LI>Is there any other ways to optimize and troubleshoot the query?</LI><LI>Can we assume that Spark sql is not meant to handle a query with more complex joins?</LI><LI>Should I break the script to multiple scripts with less number of Joins?</LI></OL> Thu, 18 Apr 2019 22:06:40 GMT barath51777 2019-04-18T22:06:40Z