Support Questions

CUSTIS · ‎07-15-2014

Hello, colleagues.

I've some problems with MapReduceIndexerTool that raises OOM error in Mapper phase on not a large data size.

We have cluster of 8 servers, 15 Gb of RAM and 0.5TB of disk. Before running the MapReduceIndexerTool in the Clouder Manager I may see that each server has ~12 GB of free RAM and >200GB of disk size. The version of Haddop is: Cloudera Express 5.0.0 (#215 built by jenkins on 20140331-1424 git: 50c701f3e920b1fcf524bf5fa061d65902cde804)

I have a list of avro files (32), each up to 320 Mb in size. Indeed I sqooped them from an Oracle table with --compression-codec snappy and in splits.

Each file has ~10 millions of records.

So, when I run the MapReduceIndexerTool, the most of the mapper tasks fail with OutOfMemoryError, the error stack follows. On some files that are smaller in size (they are not even in size, but maximum is 320 Mb) there is no error.

I googled a bit and found a suggestion to try increasing dedicated java heap memory -D 'mapred.child.java.opts=-Xmx2G'
(https://groups.google.com/a/cloudera.org/forum/#!topic/search-user/5k9apj7FSiY), but no success.

320 Mb doesn't seem too large for Hadoop, isn't it?

Any advice would be appreciated.

The command:

--------------

sudo -u hdfs hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.HdfsFindTool -find \
hdfs://$NNHOST:8020//user/root/solrindir/tmlogavro -type f \
-name 'part*.avro' |\
sudo -u hdfs hadoop --config /etc/hadoop/conf.cloudera.yarn \
jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool \
--libjars /usr/lib/solr/contrib/mr/search-mr-1.0.0-cdh5.0.0.jar \
-D 'mapred.child.java.opts=-Xmx2G' \
--log4j /var/lib/hadoop-hdfs/solr_configs_for_tm_log_morphlines/log4j.properties \
--morphline-file /var/lib/hadoop-hdfs/solr_configs_for_tm_log_morphlines/morphlines.conf \
--output-dir hdfs://$NNHOST:8020/user/$USER/solroutdir \
--update-conflict-resolver org.apache.solr.hadoop.dedup.RetainMostRecentUpdateConflictResolver \
--verbose --go-live --zk-host $ZKHOST \
--collection tm_log_avro \
--shards 32 --input-list -

The error I see in each failed mapper task:

-----------

Error: java.io.IOException: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:334) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) at org.apache.solr.hadoop.BatchWriter.close(BatchWriter.java:199) at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:322) ... 8 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155) ... 12 more Caused by: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:550) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1256) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1233) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1947) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) ... 12 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

whosch · ‎07-16-2014

The data is streamed so the file size should be irrelevant.

Perhaps there are too many reducer slots configured, i.e. the machine doesn?t have enough RAM to handle that many reducers concurrently (there?s also a --reducers CLI option which always rounds up to a multiple of the number of solr shards)

To see if the JVM -Xmx settings are applied correctly, you could, for example, add the registerJVMMetrics and startReportingMetricsToSLF4J commands to your morphline

http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/registerJVMMetrics
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#startReportingMetricsT...

Wolfgang.

View solution in original post

whosch · ‎07-16-2014

The data is streamed so the file size should be irrelevant.

Perhaps there are too many reducer slots configured, i.e. the machine doesn?t have enough RAM to handle that many reducers concurrently (there?s also a --reducers CLI option which always rounds up to a multiple of the number of solr shards)

To see if the JVM -Xmx settings are applied correctly, you could, for example, add the registerJVMMetrics and startReportingMetricsToSLF4J commands to your morphline

http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/registerJVMMetrics
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#startReportingMetricsT...

Wolfgang.

CUSTIS · ‎08-05-2014

Thank you, Wolfgang.

The cause of the problem was that the amount of Java Heap Memory dedicated for a MR container was too low, though there was enough phisycal memory on each node.

After some tuning, I found out that the ratio of memory of data (unzipped) to index to available JVM memory for a reducer task is about 2:1. So if you have a larger file to index in single reducer you likely get OutOfMemeryError.

Thanks for the help.

Cloudera Community

Support Questions

MapReduceIndexerTool OutOfMemoryError in Mapper phase