- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Map and Reduce Error: Java heap space
- Labels:
-
Cloudera Manager
-
MapReduce
-
Quickstart VM
Created on ‎10-03-2016 12:24 PM - edited ‎09-16-2022 03:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using QuickStart VM with CHD5.3, trying to run modified sample from MR-parquet read. It is worked OK on 10M rows parquet table, but I've got "Java heap space" error on table having 40M rows:
[cloudera@quickstart sep]$ yarn jar testmr-1.0-SNAPSHOT.jar TestReadParquet /user/hive/warehouse/parquet_table out_file18 -Dmapreduce.reduce.memory.mb=5120 -Dmapreduce.reduce.java.opts=-Xmx4608m -Dmapreduce.map.memory.mb=5120 -Dmapreduce.map.java.opts=-Xmx4608m
16/10/03 12:19:30 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/10/03 12:19:31 INFO input.FileInputFormat: Total input paths to process : 1
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Total input paths to process : 1
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Initiating action with parallelism: 5
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: reading another 1 footers
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: Initiating action with parallelism: 5
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
16/10/03 12:19:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/10/03 12:19:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
Oct 03, 2016 12:19:31 PM parquet.Log info
INFO: There were no row groups that could be dropped due to filter predicates
16/10/03 12:19:32 INFO mapreduce.JobSubmitter: number of splits:1
16/10/03 12:19:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1475517800829_0009
16/10/03 12:19:33 INFO impl.YarnClientImpl: Submitted application application_1475517800829_0009
16/10/03 12:19:33 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1475517800829_0009/
16/10/03 12:19:33 INFO mapreduce.Job: Running job: job_1475517800829_0009
16/10/03 12:19:47 INFO mapreduce.Job: Job job_1475517800829_0009 running in uber mode : false
16/10/03 12:19:47 INFO mapreduce.Job: map 0% reduce 0%
16/10/03 12:20:57 INFO mapreduce.Job: map 100% reduce 0%
16/10/03 12:20:57 INFO mapreduce.Job: Task Id : attempt_1475517800829_0009_m_000000_0, Status : FAILED
Error: Java heap space
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Also I've tryed to edit /etc/hadoop/conf/mapred-site.xml, tryed via cloudera manager GUI (clusters->hdfs-> ... Java Heap Size of DataNode in Bytes )
[cloudera@quickstart sep]$ free -m
total used free shared buffers cached
Mem: 13598 13150 447 0 23 206
-/+ buffers/cache: 12920 677
Swap: 6015 2187 3828
Mapper class:
public static class MyMap extends
Mapper<LongWritable, Group, NullWritable, Text> {
@Override
public void map(LongWritable key, Group value, Context context) throws IOException, InterruptedException {
NullWritable outKey = NullWritable.get();
String outputRecord = "";
// Get the schema and field values of the record
// String inputRecord = value.toString();
// Process the value, create an output record
// ...
int field1 = value.getInteger("x", 0);
if (field1 < 3) {
context.write(outKey, new Text(outputRecord));
}
}
}
Created ‎10-06-2016 05:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please add some more memory by editing the mapred-site.xml
<property> <name>mapred.child.java.opts</name> <value>-Xmx4096m</value> </property>
The above tag i have used 5gb.
Let me know if that helped you
alternatively you can also edit the hadoop-env.sh file
add
export HADOOP_OPTS="-Xmx5096m"
Created ‎04-19-2017 07:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general a service restart required after any configuration change
Again as I mentioned, it is recommended to update any configuration change via CM
Created ‎04-20-2017 02:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HDFS
Blocs sous-répliqués " do you have an idea about the solution ?
Created ‎04-20-2017 06:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you translate your issue in english? also if it is not related to Java heap space, I would recommend you to create a new thread instead so that it is easy to track and others to contribute as well
Created ‎04-21-2017 12:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for replying
i updated this file directly, instead via Cloudera manager, and i resolve my problem now 🙂 thank you so much , but i have another question i Iam running cloudera with default configuration with one-node cluster, and would like to find where HDFS stores files locally.i create a file in hdfs with hue but when i see /dfs/nn it's empty i can't find the file that i have already created
Created ‎04-21-2017 07:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default path is /opt/hadoop/dfs/nn
You can confirm this by Cloudera manager -> HDFS -> Configuration -> search for "dfs.namenode.name.dir"
Created ‎04-21-2017 07:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the path /opt/hadoop/dfs/nn does not exist ,
and when i look for the file that i already created i can't find it in the path
Created ‎04-21-2017 07:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As mentioned already, please create a new topic for new issue as it may mislead others
Also please check the full answer and reply, so that you will get desired answer
Created ‎09-26-2017 06:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last reducer of my mapreduce job fails with the below error.
2017-09-20 16:23:23,732 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.regex.Matcher.<init>(Matcher.java:224) at java.util.regex.Pattern.matcher(Pattern.java:1088) at java.lang.String.replaceAll(String.java:2162) at com.sas.ci.acs.extract.CXAService$myReduce.parseEvent(CXAService.java:1612) at com.sas.ci.acs.extract.CXAService$myReduce.reduce(CXAService.java:919) at com.sas.ci.acs.extract.CXAService$myReduce.reduce(CXAService.java:237) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2017-09-20 16:23:23,834 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system... 2017-09-20 16:23:23,834 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped. 2017-09-20 16:23:23,834 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.
Current settings:
mapreduce.map.java.opts | -Djava.net.preferIPv4Stack=true -Xmx3865051136 |
mapreduce.reduce.java.opts | -Djava.net.preferIPv4Stack=true -Xmx6144067296 |
1) do you recommend increasing the following properties to the below values ?
"mapreduce.map.java.opts","-Xmx4g"
"mapreduce.reduce.java.opts","-Xmx8g"
2) These are my map and reduce memory current settings. Do i also need to bump up my reduce memory to 10240m ?
mapreduce.reduce.memory.mb 8192
mapreduce.reduce.memory.mb 8192
Created ‎09-26-2017 12:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will not recommend to change your settings, instead you can pass the memory & java Opt when you execute your Jar.
Ex: Below are some sample value, you can change it as needed.
hadoop jar ${JAR_PATH} ${CONFIG_PATH}/filename.xml ${ENV} ${ODATE} mapMem=12288 mapJavaOpts=Xmx9830 redurMem=12288 redurJavaOpts=Xmx9830
Note:
mapJavaopts = mapMem * 0.8
redurJavaOpts = redurMem * 0.8
Created ‎10-05-2017 12:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What are the implications for increasing mapreduce/reduce.memory.mb and mapreduce.reduce.java.opts to a higher value in the cluster itself ?
One of them would be that jobs that do not need this additional memory will get it. which is of no use
Other jobs during that time may be impacted
Anything else ?
