Created 02-06-2017 08:01 AM
I am trying to run simple spark job using oozie workflow scheduler. And I am getting error as
System memory 202375168 must be at least 4.718592E8. Please use a larger heap size. i have assigned 5 GB to my HDP sandbox on Virtual box.
I have created a spark jar on my local machine and uploaded the jar to HDP sandbox. My workflow.xml looks like as below.
<workflow-app name="samplespark-wf" xmlns="uri:oozie:workflow:0.4">
<start to="sparkjob"/> <action name="sparkjob">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local[1]</master>
<name>Spark Test</name>
<class>main.scala.RDDscala.RDD1</class>
<jar>${nameNode}/spark_oozie_action/sparkrdd_2.11-0.0.1.jar</jar>
<spark-opts>--driver-memory 5g --num-executors 1</spark-opts>
</spark> <ok to="end"/>
<error to="fail"/> </action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message> </kill> <end name="end"/> </workflow-app>
This program runs fine on local system command prompt with below mentioned command
spark-submit --class main.scala.RDDscala.RDD2 --master local target\scala-2.11\sparkrdd_2.11-0.0.1.jar
Any help would be appreciated
Created 02-06-2017 08:12 AM
It seems your spark driver is running with very small heap size, please try increasing the java.driver memory and see if it helps. Use this parameter (e.g.) when submitting the job:
--driver-memory 1g
Created 02-06-2017 08:12 AM
It seems your spark driver is running with very small heap size, please try increasing the java.driver memory and see if it helps. Use this parameter (e.g.) when submitting the job:
--driver-memory 1g
Created 02-06-2017 08:14 AM
Hi Peter,
I have already specified my driver-memory to 5g in spark_opts in workflow.xml. I am still getting the same error.
Does this has something to do with memory assigned to HDP 2.5 on Virtual box. In my case it is 5 GB?
Created 02-06-2017 08:14 AM
Hi Peter,
I have already specified my driver-memory to 5g in spark_opts in workflow.xml. I am still getting the same error.
Does this has something to do with memory assigned to HDP 2.5 on Virtual box. In my case it is 5 GB?
Created 02-06-2017 08:41 AM
Ah, sorry:) Yes, here you can't specify driver related parameters using <spark-opts>--driver-memory 10g</spark-opts>
because your driver (oozie launcher job) is already launched before that point. It's a oozie launcher (which is a mapreduce job) launches your actual spark job and so spark-opts is not relevant. But the Oozie spark action doc says:
The configuration element, if present, contains configuration properties that are passed to the Spark job. This is shouldn't be spark configuration. It should be mapreduce configuration for launcher job.
So, please try to add the following
<configuration>
<property>
<name>oozie.launcher.mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
</property>
</configuration>
Created 02-06-2017 09:22 AM
Hi Peter,
Thank you so much for such a clear answer. I tried the steps mentioned below and set the individual values as 4096 and 3072 but my job failed due to "MAP capability required is more than the supported max container capability in the cluster". I checked the properties "mapreduce.map.memory.mb" and "mapreduce.map.java.opts" in mapred-site.xml and their values mentioned are 250 and -Xmx200m. So This might be the reason my job is getting killed as it is requesting container size more than default values.
Any workaround for this? If i update the values in mapred-site.xml to above mentioned values, then which services i need to restart to reflect those changes? Or can it be resolved in any other way? By the way I am running HDP 2.5
Thanks
Rahul
Created 02-06-2017 09:54 AM
The two values were just examples. Try to change them to something less that fits your system environment. Either go for less than 512 (might do the job) or increase the ram assigned to the container:
Created 02-06-2017 11:37 AM
Hi Peter,
Just a small question. My Spark oozie workflows keeps on running from long time. When i checked oozie logs i found it is trying to connect to port 8032 on sandbox.hortonworks.com. I donot know why it is going to 8032 instead of 8050 although i have mentioned 8050 in my job.properties.
Any idea?
Thanks
Rahul
Created 02-07-2017 08:51 AM
@rahul gulati Earlier I observed that this similar exception occurred at the time of launching of Oozie workflow. Can you try to set following memory related parameter in Oozie workflow.xml with some higher value like 1024mb so that workflow launches successfully.
For e.g:
<property> <name>oozie.launcher.mapred.map.child.java.opts</name> <value>-Xmx1024m</value> </property>
See if this helps you.