Reply
New Contributor
Posts: 1
Registered: ‎10-21-2017

Re: Map and Reduce Error: Java heap space

Please provide me the steps to export the value for HADOOP_OPTS. I'm getting below error,

 

Error: Could not find or load main class mapreduce.map.memory.mb=5120
Process Failed !!! Check the log file for more information
Exiting with return code: 3 !!!
None

 

I have exported the value using python code as mentioned below,

 

os.environ['HADOOP_OPTS'] = "mapreduce.map.memory.mb=5120 mapreduce.reduce.memory.mb=5120 mapreduce.map.java.opts=-Xmx4g mapreduce.reduce.java.opts=-Xmx4g"

Contributor
Posts: 53
Registered: ‎05-09-2017

Re: Map and Reduce Error: Java heap space

[ Edited ]

@saranvisa After increasing reducer heap and opts the job worked for a few days and now we are seeing this issue again. where not a single reducer completes and we are seeing job failure after 4 hrs and ALL reducers are failed. Not a single one completes

 

failed reducer log:

 

dfs.DFSClient: Slow waitForAckedSeqno took 38249ms (threshold=30000ms). File being written: /user/hadoop/normalization/6befd9a02400013179aba889/16cb62ff-463a-448b-b1d3-1cf5bb254466/_temporary/1/_temporary/attempt_1517244318452_37939_r_000028_0/custom_attribute_dir/part-00028.gz, block: BP-71764089-10.239.121.82-1481226593627:blk_1103397861_29724995, Write pipeline datanodes: [DatanodeInfoWithStorage[10.239.121.39:50010,DS-15b1c936-e838-41a2-ab40-7889aab95982,DISK], DatanodeInfoWithStorage[10.239.121.21:50010,DS-d5d914b6-6886-443b-9e39-8347c24cc9b7,DISK], DatanodeInfoWithStorage[10.239.121.56:50010,DS-63498815-70ea-48e2-b701-f0c439e38711,DISK]]
2018-03-19 23:54:17,315 WARN [main] org.apache.hadoop.hdfs.DFSClient: Slow waitForAckedSeqno took 35411ms (threshold=30000ms). File being written: /user/hadoop/normalization/6befd9a02400013179aba889/16cb62ff-463a-448b-b1d3-1cf5bb254466/_temporary/1/_temporary/attempt_1517244318452_37939_r_000028_0/documents_dir/part-00028.gz, block: BP-71764089-10.239.121.82-1481226593627:blk_1103400051_29727493, Write pipeline datanodes: [DatanodeInfoWithStorage[10.239.121.39:50010,DS-15b1c936-e838-41a2-ab40-7889aab95982,DISK], DatanodeInfoWithStorage[10.239.121.176:50010,DS-ae2d35e1-7a7e-44dc-9016-1d11881d49cc,DISK], DatanodeInfoWithStorage[10.239.121.115:50010,DS-86b207ef-b8ce-4a9f-9f6f-ddc182695296,DISK]]
2018-03-19 23:54:51,983 WARN [main] org.apache.hadoop.hdfs.DFSClient: Slow waitForAckedSeqno took 34579ms (threshold=30000ms). File being written: /user/hadoop/normalization/6befd9a02400013179aba889/16cb62ff-463a-448b-b1d3-1cf5bb254466/_temporary/1/_temporary/attempt_1517244318452_37939_r_000028_0/form_path_dir/part-00028.gz, block: BP-71764089-10.239.121.82-1481226593627:blk_1103400111_29727564, Write pipeline datanodes: [DatanodeInfoWithStorage[10.239.121.39:50010,DS-15b1c936-e838-41a2-ab40-7889aab95982,DISK], DatanodeInfoWithStorage[10.239.121.176:50010,DS-ae2d35e1-7a7e-44dc-9016-1d11881d49cc,DISK], DatanodeInfoWithStorage[10.239.121.21:50010,DS-d5d914b6-6886-443b-9e39-8347c24cc9b7,DISK]]
2018-03-19 23:55:47,506 WARN [main] org.apache.hadoop.hdfs.DFSClient: Slow waitForAckedSeqno took 55388ms (threshold=30000ms). File being written: /user/hadoop/normalization/6befd9a02400013179aba889/16cb62ff-463a-448b-b1d3-1cf5bb254466/_temporary/1/_temporary/attempt_1517244318452_37939_r_000028_0/media_hr_dir/part-00028.gz, block: BP-71764089-10.239.121.82-1481226593627:blk_1103400160_29727615, Write pipeline datanodes: [DatanodeInfoWithStorage[10.239.121.39:50010,DS-15b1c936-e838-41a2-ab40-7889aab95982,DISK], DatanodeInfoWithStorage[10.239.121.176:50010,DS-ae2d35e1-7a7e-44dc-9016-1d11881d49cc,DISK], DatanodeInfoWithStorage[10.239.121.56:50010,DS-63498815-70ea-48e2-b701-f0c439e38711,DISK]]
2018-03-19 23:55:47,661 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at java.lang.String.replaceAll(String.java:2223)
at com.xxx.ci.acs.extract.CXAService$myReduce.parseEvent(CXAService.java:1589)
at com.xxx.ci.acs.extract.CXAService$myReduce.reduce(CXAService.java:915)
at com.xxx.ci.acs.extract.CXAService$myReduce.reduce(CXAService.java:233)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2018-03-19 23:55:47,763 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system...
2018-03-19 23:55:47,763 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped.
2018-03-19 23:55:47,763 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.

 

What are the other tuning parameters we can try ?

Explorer
Posts: 32
Registered: ‎11-03-2015

Re: Map and Reduce Error: Java heap space

Hi, getting back on this old topic to have more answers in this subject.

 

I have errors with mappers and reducers falling short on memory. Of course increasing the memory fix the issue, but as already menthioned I am wasting memory for jobs that doesn't need it. 

Plus, I was thinking that this stuff was made to scale, so it would handle a particularly great job just buy splitting it.

In other words, I don't want to change memory values every time a new application fails due to memory limits. 

 

What is the best practice in this case?


Thanks

O.

Contributor
Posts: 53
Registered: ‎05-09-2017

Re: Map and Reduce Error: Java heap space

[ Edited ]

In our case the reducers were failing with OOM issue, so we first increased the reducer memory (mapreduce.reduce.memory.mb) and the mapreduce.reduce.java.opts . After a few days the job again failed. 

Sp we had up to keep the existing memory and increase the number of reducers from 40 to 60. This resolved our issue and we havent seen a failure since then. We cannot keep increasing memory for reducers which could cause other issues. 

 

A lower number of reducers will create fewer, but larger, output files. A good rule of thumb is to tune the number of reducers so that the output files are at least a half a block size.

If the reducers are completing faster and generating small files then we have too many reducers which was not in our case. 

Explorer
Posts: 32
Registered: ‎11-03-2015

Re: Map and Reduce Error: Java heap space

Ok I understand your point but what if mappers are failing ? Yarn already sets up as many mappers as files number, should I increase this more ?
Since only a minority of my jobs are failing, how can I tune yarn to use more mappers for these particular jobs?
Announcements