Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error in Scala/Spark Project on Cloudera Data Science Workbench

avatar
Rising Star

We have a Hadoop cluster with ACLs for YARN resource pools. 

 

I am trying to create a Scala/Spark project within CDSW, but it throws the following error as soon as the engine starts:

 

ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1495197568507_9413 to YARN : Application rejected by queue placement policy

I know I need to tell it to use a specific Yarn resource pool, but I don't know how/where to put that parameter so that it can take effect. I tried setting it up as a parameter in engine settings, but that didn't work.

 

Does anyone any idea about it?

 

Thanks in advance!

1 ACCEPTED SOLUTION

avatar
Rising Star

Okay - After much research I found a way to configure Yarn resource pool within the Spark/Scala Project and here are the steps:

 

1. Create Scala Project and start the engine

2. Engine startup will fail the very first time.

3. Open "Terminal" in the Workbench window and do the following:

    i. Verify that you are in /home/cdsw directory.

    ii. Create a file named "spark-defaults.conf" and add "spark.yarn.queue={QUEUE_NAME}"

    iii. Save and exit.

4. Stop and start the engine again and the issue will be resolved.

 

Regards,

MG

View solution in original post

3 REPLIES 3

avatar
Rising Star

Okay - After much research I found a way to configure Yarn resource pool within the Spark/Scala Project and here are the steps:

 

1. Create Scala Project and start the engine

2. Engine startup will fail the very first time.

3. Open "Terminal" in the Workbench window and do the following:

    i. Verify that you are in /home/cdsw directory.

    ii. Create a file named "spark-defaults.conf" and add "spark.yarn.queue={QUEUE_NAME}"

    iii. Save and exit.

4. Stop and start the engine again and the issue will be resolved.

 

Regards,

MG

avatar
Expert Contributor

MG,

 

I'm glad you figured this out.  You can configure the YARN queue, or any Spark option, either globally using Cloudera Manager or on a per project basis within Cloudera Data Science Workbench.  It sounds like you figured this out already, but the documentation for these two options is here:

 

https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_spark_configuration...

 

Configuring this option globally may make more sense, unless you're using a queue specifically for Cloudera Data Science Workbench launched Spark jobs.

 

Best,

Tristan

avatar
Rising Star

Hi Tristan,

 

You are right, configuring it globally was much easier, but we have tenant specific queues and we want to keep them contained within their pools, which is why we needed Engine/Project specific setting.

 

Anyways, thanks for your response.

 

Regards,

MG