Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error in running Spark Job using oozie workflow in local mode.(System memory 202375168 must be at least 4.718592E8. Please use a larger heap size.)

avatar
Rising Star

I am trying to run simple spark job using oozie workflow scheduler. And I am getting error as

System memory 202375168 must be at least 4.718592E8. Please use a larger heap size. i have assigned 5 GB to my HDP sandbox on Virtual box.

I have created a spark jar on my local machine and uploaded the jar to HDP sandbox. My workflow.xml looks like as below.

<workflow-app name="samplespark-wf" xmlns="uri:oozie:workflow:0.4">

<start to="sparkjob"/> <action name="sparkjob">

<spark xmlns="uri:oozie:spark-action:0.1">

<job-tracker>${jobTracker}</job-tracker>

<name-node>${nameNode}</name-node>

<master>local[1]</master>

<name>Spark Test</name>

<class>main.scala.RDDscala.RDD1</class>

<jar>${nameNode}/spark_oozie_action/sparkrdd_2.11-0.0.1.jar</jar>

<spark-opts>--driver-memory 5g --num-executors 1</spark-opts>

</spark> <ok to="end"/>

<error to="fail"/> </action>

<kill name="fail">

<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <kill name="fail-output">

<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message> </kill> <end name="end"/> </workflow-app>

This program runs fine on local system command prompt with below mentioned command

spark-submit --class main.scala.RDDscala.RDD2 --master local target\scala-2.11\sparkrdd_2.11-0.0.1.jar

Any help would be appreciated

1 ACCEPTED SOLUTION

avatar
Rising Star

It seems your spark driver is running with very small heap size, please try increasing the java.driver memory and see if it helps. Use this parameter (e.g.) when submitting the job:

--driver-memory 1g

View solution in original post

8 REPLIES 8

avatar
Rising Star

It seems your spark driver is running with very small heap size, please try increasing the java.driver memory and see if it helps. Use this parameter (e.g.) when submitting the job:

--driver-memory 1g

avatar
Rising Star

Hi Peter,

I have already specified my driver-memory to 5g in spark_opts in workflow.xml. I am still getting the same error.

Does this has something to do with memory assigned to HDP 2.5 on Virtual box. In my case it is 5 GB?

avatar
Rising Star

Hi Peter,

I have already specified my driver-memory to 5g in spark_opts in workflow.xml. I am still getting the same error.

Does this has something to do with memory assigned to HDP 2.5 on Virtual box. In my case it is 5 GB?

avatar
Rising Star

Ah, sorry:) Yes, here you can't specify driver related parameters using <spark-opts>--driver-memory 10g</spark-opts> because your driver (oozie launcher job) is already launched before that point. It's a oozie launcher (which is a mapreduce job) launches your actual spark job and so spark-opts is not relevant. But the Oozie spark action doc says:

The configuration element, if present, contains configuration properties that are passed to the Spark job. This is shouldn't be spark configuration. It should be mapreduce configuration for launcher job.

So, please try to add the following

<configuration>

<property>

<name>oozie.launcher.mapreduce.map.memory.mb</name>

<value>4096</value>

</property>

<property>

<name>oozie.launcher.mapreduce.map.java.opts</name>

<value>-Xmx3072m</value>

</property>

</configuration>

avatar
Rising Star

Hi Peter,

Thank you so much for such a clear answer. I tried the steps mentioned below and set the individual values as 4096 and 3072 but my job failed due to "MAP capability required is more than the supported max container capability in the cluster". I checked the properties "mapreduce.map.memory.mb" and "mapreduce.map.java.opts" in mapred-site.xml and their values mentioned are 250 and -Xmx200m. So This might be the reason my job is getting killed as it is requesting container size more than default values.

Any workaround for this? If i update the values in mapred-site.xml to above mentioned values, then which services i need to restart to reflect those changes? Or can it be resolved in any other way? By the way I am running HDP 2.5

Thanks

Rahul

avatar
Rising Star

The two values were just examples. Try to change them to something less that fits your system environment. Either go for less than 512 (might do the job) or increase the ram assigned to the container:

  1. increase VirtualBox memory from (I guess) 4096 to (e.g.) 8192
  2. Log into Ambari from http://my.local.host:8080
  3. change the values of yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb from the defaults to 4096
  4. Save and restart (at lease yarn, oozie, spark)

avatar
Rising Star

Hi Peter,

Just a small question. My Spark oozie workflows keeps on running from long time. When i checked oozie logs i found it is trying to connect to port 8032 on sandbox.hortonworks.com. I donot know why it is going to 8032 instead of 8050 although i have mentioned 8050 in my job.properties.

Any idea?

Thanks

Rahul

avatar
Super Collaborator

@rahul gulati Earlier I observed that this similar exception occurred at the time of launching of Oozie workflow. Can you try to set following memory related parameter in Oozie workflow.xml with some higher value like 1024mb so that workflow launches successfully.

For e.g:

<property> <name>oozie.launcher.mapred.map.child.java.opts</name> <value>-Xmx1024m</value> </property>

See if this helps you.