Reply
New Contributor
Posts: 5
Registered: ‎06-17-2015
Accepted Solution

Indexing data to Solr with a MapReduce and Morphline. OutofMemory.

Hello, 

 

I'm trying index some data in Solr with morphline. It executes one map and two reduces, failing the reducers. When I execute the same command with less data, it works. It seems a problem of memory but I increased the memory of Solr (2 nodes) to 64gb each one and it doesn't work. I'm using CDH 5.4.3. Should I increase the memory of Solr? or the problem is another thing? Any clue about it?.

 

 

hadoop jar /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/jars/search-mr-1.0.0-cdh5.4.2-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx4096m' --morphline-file /home/user/metadata/morphline_test.conf --output-dir metadata/output_test7 --zk-host xxxx:2181/solr --collection metadata --go-live /user/hive/warehouse/metadata.db/full_metadata 

 

 

Error: java.io.IOException: Batch Write Failure at org.apache.solr.hadoop.BatchWriter.throwIf(BatchWriter.java:239) at org.apache.solr.hadoop.BatchWriter.queueBatch(BatchWriter.java:181) at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:290) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.solr.common.SolrException: Exception writing document id 5c88eaf5-7352-40bf-9e9a-8358dfb11280 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:926) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1081) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:692) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at org.apache.solr.hadoop.BatchWriter.runUpdate(BatchWriter.java:135) at org.apache.solr.hadoop.BatchWriter$Batch.run(BatchWriter.java:90) at org.apache.solr.hadoop.BatchWriter.queueBatch(BatchWriter.java:180) ... 9 more Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164) ... 28 more Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:77) at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:50) at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:154) at org.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java:140) at org.apache.lucene.index.PrefixCodedTerms$Builder.add(PrefixCodedTerms.java:120) at org.apache.lucene.index.FrozenBufferedUpdates.<init>(FrozenBufferedUpdates.java:74) at org.apache.lucene.index.DocumentsWriterDeleteQueue.freezeGlobalBuffer(DocumentsWriterDeleteQueue.java:233) at org.apache.lucene.index.DocumentsWriterPerThread.prepareFlush(DocumentsWriterPerThread.java:390) at org.apache.lucene.index.DocumentsWriterFlushQueue.addFlushTicket(DocumentsWriterFlushQueue.java:70) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:507) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:624) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2949) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3104) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3071) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:582) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

Highlighted
Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Indexing data to Solr with a MapReduce and Morphline. OutofMemory.

On yarn the params are called mapreduce.map.java.opts and mapreduce.reduce.java.opts.

Wolfgang.