Support Questions
Find answers, ask questions, and share your expertise

Spark2-Submit through Oozie shell action hangs and restarts spark application

Explorer

I have a Oozie job which starts a Oozie shell action,the shell action starts a spark application (spark2-submit). I am mostly doing spark sql. The jobs runs for a while and suddenly hangs. It starts the spark application all over again. 

 

I ran the same spark application in CDSW and it ran fine without issues. 

 

The same is happening with other Oozie job . The only common thing between these two jobs is that they run longer, around 2hrs. 

 

Any help will be helpful.

1 ACCEPTED SOLUTION

Explorer

The oozie mapper was running out of 4GB memory. I changed that to 8GB. Now the job ran fine without restarts. 

 

<configuration>
<property>
<name>oozie.launcher.mapreduce.map.memory.mb</name>
<value>8000</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-Xmx1500m</value>
</property>
<property>
<name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-Xmx870m</value>
</property>
</configuration>

 

View solution in original post

14 REPLIES 14

Cloudera Employee

Do you see some error or info related to timeout or anything indicating why the Application failed and had to be restarted. You can gather the yarn log for the application and check at the Application Master section to see what was the reason for failure. May be the Spark configuration being used at CDSW are differnt than the one used by spark submit.

Explorer

Thanks for reply. I tried to see logs from SparkUI but it was blank. It will be great if you can guide me on checking yarn logs. 

I tried below command and it gave me blank too

yarn logs -applicationId application_1543621459332_8094 > spark_app.log

 

 

Explorer

Most of the logs in spark UI are showing "No logs available for container"

Explorer

Hi,

We see the error as below.

"The container showed error ...Container [pid=68208,containerID=container_e59_1543621459332_7731_01_000002] is running beyond physical memory limits. Current usage: 4.0 GB of 4 GB physical memory used; 26.5 GB of 8.4 GB virtual memory used. Killing container..."

 

When I run in CDSW or through Oozie both have same memory and configurations for my spark application(executor memory,core,driver memory,memory overhead etc). From CDSW it never failed but when I run from Oozie Shell Action (caloing spark2-submit) it randomly fails. 

Trying to understand what is different in Oozie, how do I set this memory limit.

Cloudera Employee

Hi Sunil,

 

This error is indicatig tha the containre size in yarn is set to 4.00 Gb and your spark application needs more memory to run.

 

Container Memory : yarn.nodemanager.resource.memory-mb

 

As a test you can increase the continer size in yarn configuration to say 6 Gb or 8Gb and see if the application succeeds. ( If using Cloudera manager, you will se this in CM > yarn > configuration> Container Memory

yarn.nodemanager.resource.memory-mb)

 

Regards
Bimal

Explorer

I applied this property and increased the limit to 6 GB. It still fails with exact same error message.  

Cloudera Employee

Hi Sunil,

 

That means the spark submit is asking for container size of 4 Gb. The --executor-memory must be getting set to 4g.

 

Can you check the saprk command being used and set the --executor-memory and --driver-memory to 6g.

 

 

Regards
Bimal

Explorer

sorry forgot to mention.. I have been using executor memory of 14GB and driver memory of 10GB. None of my tasks spill memory to disk. this is so strange and its shaking my fundamental understanding of spark. 

I have memory overhead of 3G.

 

Again, the same setting in CDSW are used but it never failed from there. Its when I run the job in Oozie it fails. It restarts on its own and that one completes without any failures.

 

When would spark use physical memory and virtual memory?

Explorer

The oozie mapper was running out of 4GB memory. I changed that to 8GB. Now the job ran fine without restarts. 

 

<configuration>
<property>
<name>oozie.launcher.mapreduce.map.memory.mb</name>
<value>8000</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-Xmx1500m</value>
</property>
<property>
<name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-Xmx870m</value>
</property>
</configuration>

 

Explorer

Thanks for help 

Community Manager

Congratulations on resolving your issue @Sunil. Please don't forget to mark the reply that helped resolve the issue as the answer. That way when others have a similar issue they will be more likely to find it. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Employee

Great Sunil

 

Regards

Bimal

Explorer

I get the following error

 

 line 2: spark-submit: command not found
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Explorer

 

env -i spark2-submit --keytab svc.keytab --principal svc@CORP.COM sample.py
 

We submit the jobs someothing like this.  

; ;