Support Questions
Find answers, ask questions, and share your expertise

PySpark Custom Config not being considered

PySpark Custom Config not being considered

New Contributor

I am new to PySpark. I am running below PySpark code from Zeppelin of CDP environment. When I run the same query from hive it took 2.75 mins whereas PySpark took 11 mins.


Hive (running from CDP Hue): select col1,col2,col3 from db.tab1 where col5 = <value>



from pyspark.sql import SparkSession
spark = (SparkSession.builder.appName("TestApp").getOrCreate())
spark.conf.set('master', 'yarn')
spark.conf.set('deploy-mode', 'cluster')
spark.conf.set('spark.default.parallelism', '24')
spark.conf.set('spark.executor.memory', '12g')#12g
spark.conf.set('spark.executor.cores', '130') #130
spark.conf.set('spark.executor.containers', '35')
spark.conf.set('spark.driver.memory', '20')
spark.conf.set('spark.checkpoint.compress', 'true')
spark.conf.set('spark.driver.maxResultSize' , '20g')
spark.conf.set('', 'true')
spark.conf.set('spark.sql.orc.impl', 'native')
spark.conf.set('spark.sql.hive.convertMetastoreOrc', 'true')
spark.conf.set('spark.sql.broadcastTimeout', '36000')


df1 = sqlContext.sql("select * from db.tab1")"col1", "col2","col3").where("col5 = <value>").show(5)


I see in the YARN, PySpark Application is not using the custom configuration specified.


I have followed the below link to specify the Spark run config:


Am I missing something?


Re: PySpark Custom Config not being considered


Hi @SudEl 


Please try to modify required parameters (memory and other tuning parameters)  in spark interpreter.