Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Streaming - out of memory when submit using Oozie

avatar
Contributor

Dear Colleages,

 

I submitted a Spark Streaming job via Oozie and get the following error messages:

Warning: Skip remote jar hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20160405235854/oozie/oozie-sharelib-oozie.jar.
Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error...

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LeaseRenewer:hdfs@quickstart.cloudera:8020"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Executor task launch worker-2"

Do you have an idea or a solution to prevent these error messages?

 

Thanks in advance and best regards,

 butkiz

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Dear butkiz,

 

Please add the following property to the configuration block of the oozie spark action to give this more memory.

<property>
    <name>oozie.launcher.mapreduce.map.memory.mb</name>
   <value>4096</value>
</property>
<property>
  <name>mapreduce.map.memory.mb</name>
  <value>4096</value>
</property>
<property>
    <name>oozie.launcher.mapred.child.java.opts</name>
    <value>-Xmx4096m</value>
</property>

View solution in original post

3 REPLIES 3

avatar
Expert Contributor
Hi,

Have you fixed the issue..?

avatar
Expert Contributor

Dear butkiz,

 

Please add the following property to the configuration block of the oozie spark action to give this more memory.

<property>
    <name>oozie.launcher.mapreduce.map.memory.mb</name>
   <value>4096</value>
</property>
<property>
  <name>mapreduce.map.memory.mb</name>
  <value>4096</value>
</property>
<property>
    <name>oozie.launcher.mapred.child.java.opts</name>
    <value>-Xmx4096m</value>
</property>

avatar
Contributor

Hi,

it works applying above configuration.

But now i have a NullPointerException in my spark code (rdd.foreach):
...

 

kafkaStream.foreachRDD(new VoidFunction<JavaPairRDD<String, byte[]>>() {
			
			public void call(JavaPairRDD<String, byte[]> rdd) throws Exception {
			    rdd.foreach(new VoidFunction<Tuple2<String, byte[]>>() {
					
			    	public void call(Tuple2<String, byte[]> avroRecord) throws Exception {

 

In local mode it works but not in yarn-cluster.

 

Do you have any ideas in order to get it running?

 

Best Regards,

 Butkiz