Support Questions

munnyrahul · ‎02-06-2017

I am trying to run simple spark job using oozie workflow scheduler. And I am getting error as

System memory 202375168 must be at least 4.718592E8. Please use a larger heap size. i have assigned 5 GB to my HDP sandbox on Virtual box.

I have created a spark jar on my local machine and uploaded the jar to HDP sandbox. My workflow.xml looks like as below.

<workflow-app name="samplespark-wf" xmlns="uri:oozie:workflow:0.4">

<job-tracker>${jobTracker}</job-tracker>

<name-node>${nameNode}</name-node>

<master>local[1]</master>

<name>Spark Test</name>

<class>main.scala.RDDscala.RDD1</class>

<jar>${nameNode}/spark_oozie_action/sparkrdd_2.11-0.0.1.jar</jar>

<spark-opts>--driver-memory 5g --num-executors 1</spark-opts>

</spark> <ok to="end"/>

<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <kill name="fail-output">

<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message> </kill> <end name="end"/> </workflow-app>

This program runs fine on local system command prompt with below mentioned command

spark-submit --class main.scala.RDDscala.RDD2 --master local target\scala-2.11\sparkrdd_2.11-0.0.1.jar

Any help would be appreciated

pgreiff · ‎02-06-2017

It seems your spark driver is running with very small heap size, please try increasing the java.driver memory and see if it helps. Use this parameter (e.g.) when submitting the job:

--driver-memory 1g

View solution in original post

pgreiff · ‎02-06-2017

It seems your spark driver is running with very small heap size, please try increasing the java.driver memory and see if it helps. Use this parameter (e.g.) when submitting the job:

--driver-memory 1g

munnyrahul · ‎02-06-2017

Hi Peter,

I have already specified my driver-memory to 5g in spark_opts in workflow.xml. I am still getting the same error.

Does this has something to do with memory assigned to HDP 2.5 on Virtual box. In my case it is 5 GB?

munnyrahul · ‎02-06-2017

Hi Peter,

I have already specified my driver-memory to 5g in spark_opts in workflow.xml. I am still getting the same error.

Does this has something to do with memory assigned to HDP 2.5 on Virtual box. In my case it is 5 GB?

pgreiff · ‎02-06-2017

Ah, sorry:) Yes, here you can't specify driver related parameters using <spark-opts>--driver-memory 10g</spark-opts> because your driver (oozie launcher job) is already launched before that point. It's a oozie launcher (which is a mapreduce job) launches your actual spark job and so spark-opts is not relevant. But the Oozie spark action doc says:

The configuration element, if present, contains configuration properties that are passed to the Spark job. This is shouldn't be spark configuration. It should be mapreduce configuration for launcher job.

So, please try to add the following

<name>oozie.launcher.mapreduce.map.memory.mb</name>

</property>

<name>oozie.launcher.mapreduce.map.java.opts</name>

</property>

</configuration>

munnyrahul · ‎02-06-2017

Hi Peter,

Thank you so much for such a clear answer. I tried the steps mentioned below and set the individual values as 4096 and 3072 but my job failed due to "MAP capability required is more than the supported max container capability in the cluster". I checked the properties "mapreduce.map.memory.mb" and "mapreduce.map.java.opts" in mapred-site.xml and their values mentioned are 250 and -Xmx200m. So This might be the reason my job is getting killed as it is requesting container size more than default values.

Any workaround for this? If i update the values in mapred-site.xml to above mentioned values, then which services i need to restart to reflect those changes? Or can it be resolved in any other way? By the way I am running HDP 2.5

Thanks

Rahul

pgreiff · ‎02-06-2017

The two values were just examples. Try to change them to something less that fits your system environment. Either go for less than 512 (might do the job) or increase the ram assigned to the container:

increase VirtualBox memory from (I guess) 4096 to (e.g.) 8192
Log into Ambari from http://my.local.host:8080
change the values of yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb from the defaults to 4096
Save and restart (at lease yarn, oozie, spark)

munnyrahul · ‎02-06-2017

Hi Peter,

Just a small question. My Spark oozie workflows keeps on running from long time. When i checked oozie logs i found it is trying to connect to port 8032 on sandbox.hortonworks.com. I donot know why it is going to 8032 instead of 8050 although i have mentioned 8050 in my job.properties.

Any idea?

Thanks

Rahul

pbishnoi · ‎02-07-2017

@rahul gulati Earlier I observed that this similar exception occurred at the time of launching of Oozie workflow. Can you try to set following memory related parameter in Oozie workflow.xml with some higher value like 1024mb so that workflow launches successfully.

For e.g:

<property> <name>oozie.launcher.mapred.map.child.java.opts</name> <value>-Xmx1024m</value> </property>

See if this helps you.

Cloudera Community

Support Questions

Error in running Spark Job using oozie workflow in local mode.(System memory 202375168 must be at least 4.718592E8. Please use a larger heap size.)