I'm trying to start a sparkR session from within an R session; not through spark-submit. However, when i try to set a queue like this, it doesn't work:
The only way I can get it to actually set that queue is to use the old ```init()``` function, which thows a warning:
sc <- SparkR::sparkR.init(master = "yarn-client", sparkEnvir = list(spark.yarn.queue="a2_hungry")) hiveContext <- sparkRHive.init(sc)
Warning message: 'SparkR::sparkR.init' is deprecated. Use 'sparkR.session' instead. See help("Deprecated")
How can I set a queue in the non-deprecated way?
Are you using Spark 2.x ? The API has changed from Spark2.
Pelase see below : https://spark.apache.org/docs/2.0.0/api/R/sparkR.session.html
## Not run: ##D sparkR.session() ##D df <- read.json(path) ##D ##D sparkR.session("local", "SparkR", "/home/spark") ##D sparkR.session("yarn-client", "SparkR", "/home/spark", ##D list(spark.executor.memory="4g"), ##D c("one.jar", "two.jar", "three.jar"), ##D c("com.databricks:spark-avro_2.10:2.0.1")) ##D sparkR.session(spark.master = "yarn-client", spark.executor.memory = "4g") ## End(Not run)
When starting spark R, the Spark Session is already generated.
You need to stop the current session and spin up a new one to set the desired settings.
I use the following
sparkR.stop() sparkR.session( # master="local", # local master master="yarn", # cluster master appName="my_sparkR", sparkConfig=list( spark.driver.memory="4g", spark.executor.memory="2g", spark.yarn.queue="your_desired_queue" ) )
Verify from the Spark monitoring page that the settings updated correctly.