Support Questions

Find answers, ask questions, and share your expertise

Map and Reduce Error: Java heap space

avatar
Explorer

I'm using QuickStart VM with CHD5.3, trying to run modified sample from MR-parquet read. It is worked OK on 10M rows parquet table, but I've got "Java heap space" error on table having 40M rows:

 

[cloudera@quickstart sep]$ yarn jar testmr-1.0-SNAPSHOT.jar TestReadParquet /user/hive/warehouse/parquet_table out_file18 -Dmapreduce.reduce.memory.mb=5120 -Dmapreduce.reduce.java.opts=-Xmx4608m -Dmapreduce.map.memory.mb=5120 -Dmapreduce.map.java.opts=-Xmx4608m
16/10/03 12:19:30 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/10/03 12:19:31 INFO input.FileInputFormat: Total input paths to process : 1
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Total input paths to process : 1
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Initiating action with parallelism: 5
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: reading another 1 footers
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Initiating action with parallelism: 5
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
16/10/03 12:19:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/10/03 12:19:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: There were no row groups that could be dropped due to filter predicates
16/10/03 12:19:32 INFO mapreduce.JobSubmitter: number of splits:1
16/10/03 12:19:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1475517800829_0009
16/10/03 12:19:33 INFO impl.YarnClientImpl: Submitted application application_1475517800829_0009
16/10/03 12:19:33 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1475517800829_0009/
16/10/03 12:19:33 INFO mapreduce.Job: Running job: job_1475517800829_0009
16/10/03 12:19:47 INFO mapreduce.Job: Job job_1475517800829_0009 running in uber mode : false
16/10/03 12:19:47 INFO mapreduce.Job: map 0% reduce 0%
16/10/03 12:20:57 INFO mapreduce.Job: map 100% reduce 0%
16/10/03 12:20:57 INFO mapreduce.Job: Task Id : attempt_1475517800829_0009_m_000000_0, Status : FAILED
Error: Java heap space
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


Also I've tryed to edit /etc/hadoop/conf/mapred-site.xml, tryed via cloudera manager GUI (clusters->hdfs-> ... Java Heap Size of DataNode in Bytes )

 

[cloudera@quickstart sep]$ free -m
total used free shared buffers cached
Mem: 13598 13150 447 0 23 206
-/+ buffers/cache: 12920 677
Swap: 6015 2187 3828

 

Mapper class:

 

public static class MyMap extends
Mapper<LongWritable, Group, NullWritable, Text> {

@Override
public void map(LongWritable key, Group value, Context context) throws IOException, InterruptedException {
NullWritable outKey = NullWritable.get();
String outputRecord = "";
// Get the schema and field values of the record
// String inputRecord = value.toString();
// Process the value, create an output record
// ...
int field1 = value.getInteger("x", 0);

if (field1 < 3) {
context.write(outKey, new Text(outputRecord));
}
}
}

 

1 ACCEPTED SOLUTION

avatar
Champion

Please add some more memory by editing the mapred-site.xml

 

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx4096m</value>
</property>

The above tag i have used 5gb.

Let me know if that helped you

 

alternatively you can also edit the hadoop-env.sh file 

add 

export HADOOP_OPTS="-Xmx5096m"

 

View solution in original post

24 REPLIES 24

avatar
Champion

@onsbt

 

In general a service restart required after any configuration change 

 

Again as I mentioned, it is recommended to update any configuration change via CM

avatar
Contributor
thanks for replying ,the problem was solved i would like to ask another question , because after research i didn't found the solution , so after installing cloudera manager i got a problem with HDFS "hdfs: Problèmes d'état d'intégrité
HDFS
Blocs sous-répliqués " do you have an idea about the solution ?

avatar
Champion

@onsbt

 

Can you translate your issue in english? also if it is not related to Java heap space, I would recommend you to create a new thread instead so that it is easy to track and others to contribute as well

avatar
Contributor

thanks for replying 

i  updated this file directly, instead  via Cloudera manager, and i resolve my problem now 🙂 thank you so much , but i have another question i Iam running cloudera with default configuration with one-node cluster, and would like to find where HDFS stores files locally.i create a file in hdfs with hue but when i see /dfs/nn it's empty i can't find the file that i have already created

avatar
Champion

@onsbt

 

The default path is /opt/hadoop/dfs/nn

 

You can confirm this by Cloudera manager -> HDFS -> Configuration -> search for "dfs.namenode.name.dir"

avatar
Contributor

the path /opt/hadoop/dfs/nn does not exist ,
and when i look for the file that i already created i can't find it in the path

 

 

avatar
Champion

@onsbt

 

As mentioned already, please create a new topic for new issue as it may mislead others

 

Also please check the full answer and reply, so that you will get desired answer

avatar
Expert Contributor

@saranvisa

 

The last reducer of my mapreduce job fails with the below error. 

 

2017-09-20 16:23:23,732 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.regex.Matcher.<init>(Matcher.java:224)
	at java.util.regex.Pattern.matcher(Pattern.java:1088)
	at java.lang.String.replaceAll(String.java:2162)
	at com.sas.ci.acs.extract.CXAService$myReduce.parseEvent(CXAService.java:1612)
	at com.sas.ci.acs.extract.CXAService$myReduce.reduce(CXAService.java:919)
	at com.sas.ci.acs.extract.CXAService$myReduce.reduce(CXAService.java:237)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2017-09-20 16:23:23,834 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system...
2017-09-20 16:23:23,834 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped.
2017-09-20 16:23:23,834 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.

 

Current settings: 

 

mapreduce.map.java.opts-Djava.net.preferIPv4Stack=true -Xmx3865051136

 

mapreduce.reduce.java.opts-Djava.net.preferIPv4Stack=true -Xmx6144067296

 

1) do you recommend increasing the following properties to the below values ?

 

"mapreduce.map.java.opts","-Xmx4g" 
"mapreduce.reduce.java.opts","-Xmx8g" 

 

2) These are my map and reduce memory current settings. Do i also need to bump up my reduce memory to 10240m ? 

 

mapreduce.reduce.memory.mb 8192
mapreduce.reduce.memory.mb 8192

avatar
Champion

@desind

 

I will not recommend to change your settings, instead you can pass the memory & java Opt when you execute your Jar.

 

 

Ex: Below are some sample value, you can change it as needed.

 

hadoop jar ${JAR_PATH} ${CONFIG_PATH}/filename.xml ${ENV} ${ODATE} mapMem=12288 mapJavaOpts=Xmx9830 redurMem=12288 redurJavaOpts=Xmx9830

 

Note:

mapJavaopts = mapMem * 0.8

redurJavaOpts = redurMem * 0.8

 

avatar
Expert Contributor

@saranvisa

 

 

 

Anything else ?