Support Questions

Find answers, ask questions, and share your expertise

Map and Reduce Error: Java heap space

avatar
Explorer

I'm using QuickStart VM with CHD5.3, trying to run modified sample from MR-parquet read. It is worked OK on 10M rows parquet table, but I've got "Java heap space" error on table having 40M rows:

 

[cloudera@quickstart sep]$ yarn jar testmr-1.0-SNAPSHOT.jar TestReadParquet /user/hive/warehouse/parquet_table out_file18 -Dmapreduce.reduce.memory.mb=5120 -Dmapreduce.reduce.java.opts=-Xmx4608m -Dmapreduce.map.memory.mb=5120 -Dmapreduce.map.java.opts=-Xmx4608m
16/10/03 12:19:30 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/10/03 12:19:31 INFO input.FileInputFormat: Total input paths to process : 1
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Total input paths to process : 1
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Initiating action with parallelism: 5
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: reading another 1 footers
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Initiating action with parallelism: 5
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
16/10/03 12:19:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/10/03 12:19:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: There were no row groups that could be dropped due to filter predicates
16/10/03 12:19:32 INFO mapreduce.JobSubmitter: number of splits:1
16/10/03 12:19:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1475517800829_0009
16/10/03 12:19:33 INFO impl.YarnClientImpl: Submitted application application_1475517800829_0009
16/10/03 12:19:33 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1475517800829_0009/
16/10/03 12:19:33 INFO mapreduce.Job: Running job: job_1475517800829_0009
16/10/03 12:19:47 INFO mapreduce.Job: Job job_1475517800829_0009 running in uber mode : false
16/10/03 12:19:47 INFO mapreduce.Job: map 0% reduce 0%
16/10/03 12:20:57 INFO mapreduce.Job: map 100% reduce 0%
16/10/03 12:20:57 INFO mapreduce.Job: Task Id : attempt_1475517800829_0009_m_000000_0, Status : FAILED
Error: Java heap space
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


Also I've tryed to edit /etc/hadoop/conf/mapred-site.xml, tryed via cloudera manager GUI (clusters->hdfs-> ... Java Heap Size of DataNode in Bytes )

 

[cloudera@quickstart sep]$ free -m
total used free shared buffers cached
Mem: 13598 13150 447 0 23 206
-/+ buffers/cache: 12920 677
Swap: 6015 2187 3828

 

Mapper class:

 

public static class MyMap extends
Mapper<LongWritable, Group, NullWritable, Text> {

@Override
public void map(LongWritable key, Group value, Context context) throws IOException, InterruptedException {
NullWritable outKey = NullWritable.get();
String outputRecord = "";
// Get the schema and field values of the record
// String inputRecord = value.toString();
// Process the value, create an output record
// ...
int field1 = value.getInteger("x", 0);

if (field1 < 3) {
context.write(outKey, new Text(outputRecord));
}
}
}

 

1 ACCEPTED SOLUTION

avatar
Champion

Please add some more memory by editing the mapred-site.xml

 

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx4096m</value>
</property>

The above tag i have used 5gb.

Let me know if that helped you

 

alternatively you can also edit the hadoop-env.sh file 

add 

export HADOOP_OPTS="-Xmx5096m"

 

View solution in original post

24 REPLIES 24

avatar
Explorer

hadoop-cmf-yarn-NODEMANAGER-quickstart.cloudera.log.out:

 

2016-10-03 12:22:14,533 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 18309 for container-id container_1475517800829_0009_01_000005: 130.2 MB of 3 GB physical memory used; 859.9 MB of 6.3 GB virtual memory used
2016-10-03 12:22:28,045 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 16676 for container-id container_1475517800829_0009_01_000001: 178.8 MB of 1 GB physical memory used; 931.1 MB of 2.1 GB virtual memory used
2016-10-03 12:22:31,303 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 18309 for container-id container_1475517800829_0009_01_000005: 128.8 MB of 3 GB physical memory used; 859.9 MB of 6.3 GB virtual memory used
2016-10-03 12:22:46,965 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Error reading the stream java.io.IOException: No such process
2016-10-03 12:22:46,966 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 16676 for container-id container_1475517800829_0009_01_000001: 179.0 MB of 1 GB physical memory used; 931.1 MB of 2.1 GB virtual memory used
2016-10-03 12:22:47,122 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1475517800829_0009_01_000005

avatar
Champion

Please add some more memory by editing the mapred-site.xml

 

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx4096m</value>
</property>

The above tag i have used 5gb.

Let me know if that helped you

 

alternatively you can also edit the hadoop-env.sh file 

add 

export HADOOP_OPTS="-Xmx5096m"

 

avatar
Explorer

Thanks ! mapred.child.java.opts in mapred-site.xml solved the issue

avatar
Champion
Sounds good mate

avatar
Champion

If you need more details, pls refer below

 

mapred.map.child.java.opts is for Hadoop 1.x

 

Those who are using Hadoop 2.x, pls use the below parameters instead

 

mapreduce.map.java.opts=-Xmx4g         # Note: 4 GB

mapreduce.reduce.java.opts=-Xmx4g     # Note: 4 GB

 

Also when you set java.opts, you need to note two important points

1. It has dependency on memory.mb, so always try to set java.opts upto 80% of memory.mb

2. Follow the "-Xmx4g" format for opt but numerical value for memory.mb

 

mapreduce.map.memory.mb = 5012        #  Note: 5 GB

mapreduce.reduce.memory.mb = 5012    # Note: 5 GB

 

Finally, some organization will not allow you to alter mapred-site.xml directly or via CM. Also we need thease kind of setup only to handle very big tables, so it is not recommanded to alter the configuration only for few tables..so you can do this setup temporarly by following below steps: 

 

1. From HDFS:

HDFS> export HIVE_OPTS="-hiveconf mapreduce.map.memory.mb=5120 -hiveconf mapreduce.reduce.memory.mb=5120 -hiveconf mapreduce.map.java.opts=-Xmx4g -hiveconf mapreduce.reduce.java.opts=-Xmx4g"

2. From Hive:

hive> set mapreduce.map.memory.mb=5120;

hive> set mapreduce.reduce.memory.mb=5120;

hive> set mapreduce.map.java.opts=-Xmx4g;

hive> set mapreduce.reduce.java.opts=-Xmx4g;

 

Note: HIVE_OPTS is to handle only HIVE, if you need similar setup for HADOOP then use HADOOP_OPTS

 

Thanks

Kumar

avatar
New Contributor

Please provide me the steps to export the value for HADOOP_OPTS. I'm getting below error,

 

Error: Could not find or load main class mapreduce.map.memory.mb=5120
Process Failed !!! Check the log file for more information
Exiting with return code: 3 !!!
None

 

I have exported the value using python code as mentioned below,

 

os.environ['HADOOP_OPTS'] = "mapreduce.map.memory.mb=5120 mapreduce.reduce.memory.mb=5120 mapreduce.map.java.opts=-Xmx4g mapreduce.reduce.java.opts=-Xmx4g"

avatar
Contributor
i would like to know the location of this file , because i found many files of mapred-site;thx again

avatar
Champion

@onsbt

 

In general, the path is /etc/hadoop/conf

 

But I would recommend you to not update this file directly, instead update via Cloudera manager -> Yarn -> Configuration. if you are not using CM ask your admin

 

Also an another recommendation is, you can set those values 'temporarily' & directly in HDFS/Hive and test  to find the suitable value for your environment before you make the permanent change in configuration file

avatar
Contributor

thanks for replying , actually i want to decrase the size of heap mmemory of HDFS and kafka do you have any propositions ?i mofied the /opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/bin/kafka-run-class.sh file but this does'nt give me any result

any help plz ?