Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error in Scala/Spark Project on Cloudera Data Science Workbench

Solved Go to solution

Error in Scala/Spark Project on Cloudera Data Science Workbench

Contributor

We have a Hadoop cluster with ACLs for YARN resource pools. 

 

I am trying to create a Scala/Spark project within CDSW, but it throws the following error as soon as the engine starts:

 

ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1495197568507_9413 to YARN : Application rejected by queue placement policy

I know I need to tell it to use a specific Yarn resource pool, but I don't know how/where to put that parameter so that it can take effect. I tried setting it up as a parameter in engine settings, but that didn't work.

 

Does anyone any idea about it?

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Error in Scala/Spark Project on Cloudera Data Science Workbench

Contributor

Okay - After much research I found a way to configure Yarn resource pool within the Spark/Scala Project and here are the steps:

 

1. Create Scala Project and start the engine

2. Engine startup will fail the very first time.

3. Open "Terminal" in the Workbench window and do the following:

    i. Verify that you are in /home/cdsw directory.

    ii. Create a file named "spark-defaults.conf" and add "spark.yarn.queue={QUEUE_NAME}"

    iii. Save and exit.

4. Stop and start the engine again and the issue will be resolved.

 

Regards,

MG

3 REPLIES 3

Re: Error in Scala/Spark Project on Cloudera Data Science Workbench

Contributor

Okay - After much research I found a way to configure Yarn resource pool within the Spark/Scala Project and here are the steps:

 

1. Create Scala Project and start the engine

2. Engine startup will fail the very first time.

3. Open "Terminal" in the Workbench window and do the following:

    i. Verify that you are in /home/cdsw directory.

    ii. Create a file named "spark-defaults.conf" and add "spark.yarn.queue={QUEUE_NAME}"

    iii. Save and exit.

4. Stop and start the engine again and the issue will be resolved.

 

Regards,

MG

Highlighted

Re: Error in Scala/Spark Project on Cloudera Data Science Workbench

Rising Star

MG,

 

I'm glad you figured this out.  You can configure the YARN queue, or any Spark option, either globally using Cloudera Manager or on a per project basis within Cloudera Data Science Workbench.  It sounds like you figured this out already, but the documentation for these two options is here:

 

https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_spark_configuration...

 

Configuring this option globally may make more sense, unless you're using a queue specifically for Cloudera Data Science Workbench launched Spark jobs.

 

Best,

Tristan

Re: Error in Scala/Spark Project on Cloudera Data Science Workbench

Contributor

Hi Tristan,

 

You are right, configuring it globally was much easier, but we have tenant specific queues and we want to keep them contained within their pools, which is why we needed Engine/Project specific setting.

 

Anyways, thanks for your response.

 

Regards,

MG

Don't have an account?
Coming from Hortonworks? Activate your account here