Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MapReduceIndexerTool erroring with max_array_length

MapReduceIndexerTool erroring with max_array_length

New Contributor


I am trying to use the MapReduceIndexerTool to index data in a hive table to Solr Cloud / Cloudera Search. 

The tool is failing the job with the following error 


1799 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Indexing 1 files using 1 real mappers into 10 reducers




36962 [main] ERROR org.apache.solr.hadoop.MapReduceIndexerTool  - Job failed! jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId: job_1473161870114_0339


The error stack trace is 

2016-09-08 10:39:20,128 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: MAX_ARRAY_LENGTH
	at org.apache.lucene.codecs.memory.DirectDocValuesFormat.<clinit>(
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
	at java.lang.reflect.Constructor.newInstance(
	at java.lang.Class.newInstance(
	at org.apache.lucene.util.NamedSPILoader.reload(
	at org.apache.lucene.util.NamedSPILoader.<init>(
	at org.apache.lucene.util.NamedSPILoader.<init>(
	at org.apache.lucene.codecs.DocValuesFormat.<clinit>(
	at org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(



My Schema.xml looks like



   <field name="dataset_id" type="string" indexed="true" stored="true" required="true" multiValued="false" docValue="true" />

   <field name="search_string" type="string" indexed="true" stored="true" docValue="true"/>

   <field name="_version_" type="long" indexed="true" stored="true"/>




<!-- Field to use to determine and enforce document uniqueness.

      Unless this field is marked with required="false", it will be a required field





I am otherwise about to post documents using Solr APIs / upload methods. Only the MapReduceIndexer tool is failing. 


The command I am using is 

hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D '' --log4j /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/share/doc/search-1.0.0+cdh5.7.0+0/examples/solr-nrt/ --morphline-file /home/$USER/morphline2.conf --output-dir hdfs://NNHOST:8020/user/$USER/outdir --verbose --zk-host ZKHOST:2181/solr1 --collection dataCatalog_search_index hdfs://NNHOST:8020/user/hive/warehouse/name.db/concatenated_index4/;


My morphline config looks like



  # Name of solr collection

  collection : search_index


  # ZooKeeper ensemble




# Specify an array of one or more morphlines, each of which defines an ETL

# transformation chain. A morphline consists of one or more (potentially

# nested) commands. A morphline is a way to consume records (e.g. Flume events,

# HDFS files or blocks), turn them into a stream of records, and pipe the stream

# of records through a set of easily configurable transformations on the way to

# a target application such as Solr.

morphlines : [


    id : search_index

    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands : [


        readCSV {

          separator : ","

          columns : [dataset_id,search_string]

          ignoreFirstLine : true

          charset : UTF-8





      # Consume the output record of the previous command and pipe another

      # record downstream.


      # Command that deletes record fields that are unknown to Solr

      # schema.xml.


      # Recall that Solr throws an exception on any attempt to load a document

      # that contains a field that isn't specified in schema.xml.


        sanitizeUnknownSolrFields {

          # Location from which to fetch Solr schema

          solrLocator : ${SOLR_LOCATOR}




      # log the record at DEBUG level to SLF4J

      { logDebug { format : "output record: {}", args : ["@{}"] } }


      # load the record into a Solr server or MapReduce Reducer


        loadSolr {         

          solrLocator : ${SOLR_LOCATOR}








Please let me know if I am going anything wrong. 



Re: MapReduceIndexerTool erroring with max_array_length

Expert Contributor

if it's failed while you built hive data into solr , pls try build data from hbase to solr.

then first you need create external table to link hbase table, then insert your hive data into external table.


i have tried many times to load hbase into solr , no any problem.


hadoop --config  /etc/hadoop/conf   \ 
jar /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar \ 
--conf  /etc/hbase/conf/hbase-site.xml -D ''
--hbase-indexer-file /opt/hbase-indexers/saic_sms_flow/morphline-hbase-mapper.xml  
--zk-host jq-zk03.hadoop,jq-zk02.hadoop,jq-zk01.hadoop/solr --collection saic_sms_flow --reducers 0