Created on 05-23-2017 11:45 AM - edited 09-16-2022 04:38 AM
We have a Hadoop cluster with ACLs for YARN resource pools.
I am trying to create a Scala/Spark project within CDSW, but it throws the following error as soon as the engine starts:
ERROR spark.SparkContext: Error initializing SparkContext. org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1495197568507_9413 to YARN : Application rejected by queue placement policy
I know I need to tell it to use a specific Yarn resource pool, but I don't know how/where to put that parameter so that it can take effect. I tried setting it up as a parameter in engine settings, but that didn't work.
Does anyone any idea about it?
Thanks in advance!
Created 05-24-2017 01:31 PM
Okay - After much research I found a way to configure Yarn resource pool within the Spark/Scala Project and here are the steps:
1. Create Scala Project and start the engine
2. Engine startup will fail the very first time.
3. Open "Terminal" in the Workbench window and do the following:
i. Verify that you are in /home/cdsw directory.
ii. Create a file named "spark-defaults.conf" and add "spark.yarn.queue={QUEUE_NAME}"
iii. Save and exit.
4. Stop and start the engine again and the issue will be resolved.
Regards,
MG
Created 05-24-2017 01:31 PM
Okay - After much research I found a way to configure Yarn resource pool within the Spark/Scala Project and here are the steps:
1. Create Scala Project and start the engine
2. Engine startup will fail the very first time.
3. Open "Terminal" in the Workbench window and do the following:
i. Verify that you are in /home/cdsw directory.
ii. Create a file named "spark-defaults.conf" and add "spark.yarn.queue={QUEUE_NAME}"
iii. Save and exit.
4. Stop and start the engine again and the issue will be resolved.
Regards,
MG
Created 05-25-2017 12:34 PM
MG,
I'm glad you figured this out. You can configure the YARN queue, or any Spark option, either globally using Cloudera Manager or on a per project basis within Cloudera Data Science Workbench. It sounds like you figured this out already, but the documentation for these two options is here:
Configuring this option globally may make more sense, unless you're using a queue specifically for Cloudera Data Science Workbench launched Spark jobs.
Best,
Tristan
Created 05-25-2017 12:39 PM
Hi Tristan,
You are right, configuring it globally was much easier, but we have tenant specific queues and we want to keep them contained within their pools, which is why we needed Engine/Project specific setting.
Anyways, thanks for your response.
Regards,
MG