Reply
Explorer
Posts: 18
Registered: ‎05-02-2014

MapReduceIndexerTool fails with java.lang.OutOfMemoryError in QuickstartVM CDH 5.5.0

[ Edited ]

Hi,

 

using the Quickstart VM with CDH 5.5.1, we tried to index a couple of PDF and DOCX files. Creating a collection went smoothly. Decided to use MapReduceIndexerTool for parsing the documents and indexing them.

 

The mapper syslog goes up to... Starting flush of map output ...

...
2016-03-01 04:07:28,821 WARN [main] org.apache.solr.core.SolrResourceLoader: Solr loaded a deprecated plugin/analysis class [solr.ThaiWordFilterFactory]. Please consult documentation how to replace it accordingly.
2016-03-01 04:07:28,934 INFO [main] org.apache.solr.schema.IndexSchema: unique key field: id
2016-03-01 04:07:29,157 INFO [main] org.apache.solr.schema.FileExchangeRateProvider: Reloading exchange rates from file currency.xml
2016-03-01 04:07:29,194 INFO [main] org.apache.solr.schema.FileExchangeRateProvider: Reloading exchange rates from file currency.xml
2016-03-01 04:07:30,176 INFO [main] org.kitesdk.morphline.api.MorphlineContext: Importing commands
2016-03-01 04:07:46,150 INFO [main] org.kitesdk.morphline.api.MorphlineContext: Done importing commands
2016-03-01 04:07:47,576 INFO [main] org.apache.solr.hadoop.morphline.MorphlineMapRunner: Processing file hdfs://quickstart.cloudera/user/cloudera/tamap.data/filename.docx
2016-03-01 04:07:52,287 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output

stderr contains:

 

Halting due to Out Of Memory Error...

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

 

Tried many different memory setting in both YARN and SOlr, no success. Any help would be greatly apprecietad.

 

The output of the "hadoop jar" command says...

16/03/01 04:29:11 INFO mapreduce.Job: Task Id : attempt_1456833935429_0003_m_000000_2, Status : FAILED
Exception from container-launch.
Container id: container_1456833935429_0003_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
	at org.apache.hadoop.util.Shell.run(Shell.java:460)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1

 

Thanks,

Slavo

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: MapReduceIndexerTool fails with java.lang.OutOfMemoryError in QuickstartVM CDH 5.5.0

Try using something like this CLI option: 

 

-D mapreduce.map.java.opts="-Xmx2000m" 

Highlighted
Explorer
Posts: 18
Registered: ‎05-02-2014

Re: MapReduceIndexerTool fails with java.lang.OutOfMemoryError in QuickstartVM CDH 5.5.0

Thanks, but already tried those options with different settings... Here is the command

hadoop jar /usr/lib/solr/contrib/mr/search-mr-1.0.0-cdh5.5.0-job.jar \
 org.apache.solr.hadoop.MapReduceIndexerTool \
 -D 'mapreduce.map.java.opts=-Xmx2048m' \
 -D 'mapreduce.reduce.java.opts=-Xmx2048m' \
 --mappers=1 \
 --reducers=1 \
 --morphline-file morphline.conf \
 --output-dir hdfs://quickstart.cloudera/user/cloudera/my.output/ \
 --zk-host quickstart.cloudera:2181/solr  \
 --collection mycollection \
 --go-live \
 --verbose \
 hdfs://quickstart.cloudera/user/cloudera/my.data/

Still the same error. My laptop has 16GB RAM, using 12GB for the VM.