Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

MapReduceIndexerTool is not indexing AVRO file

MapReduceIndexerTool is not indexing AVRO file

New Contributor

I am using clodera environemnt to index AVRO file(s)

 

The command I tried :

 

sudo -u hdfs hadoop jar /opt/cloudera/parcels/SOLR-1.1.0-1.cdh4.3.0.p0.21/lib/solr/contrib/mr/search-mr-1.1.0-job.jar org.apache.solr.hadoop.MapReduceIndexerTool --morphline-file /tmp/avro.indexing/morphline.conf --output-dir hdfs://<AVRO indexing path> hdfs://<AVRO input path> --verbose --go-live --zk-host <ZK host URL> --collection automatch_collection --log4j <log file>

 

The main cause I found, The main cause I was able to find, that is morphline.readAvroContainer.numRecords=0

 

I supposed to attache some files.

Is it possible?

5 REPLIES 5

Re: MapReduceIndexerTool is not indexing AVRO file

Expert Contributor
Try the MapReduceIndexerTool --dry-run command line option and enable TRACE log4j level as shown here:

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#logTrace

For concrete advise include as much detailed information as possible about the problem you're seeing, including the log files, the morphline config file and possibly some sample data.

Wolfgang.

Re: MapReduceIndexerTool is not indexing AVRO file

New Contributor

I could not find the file attachment option.

Re: MapReduceIndexerTool is not indexing AVRO file

New Contributor

I am also getting errors like,

 

Error making BlockReader. Closing stale NioInetPeer

java.io.EOFException: Premature EOF: no length prefix available

Re: MapReduceIndexerTool is not indexing AVRO file

New Contributor

Schema.xml

 

<field name="ID" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="EnterpriseID" type="string" indexed="true" stored="true"/>
<field name="ReconDefinitionPK" type="string" indexed="true" stored="true"/>
<field name="ExecutionReferenceNumber" type="string" indexed="true" stored="true"/>
<field name="ReconciliationDate" type="string" indexed="true" stored="true"/>

 

Morphline.conf

 

extractAvroPaths {
flatten : true
paths : {
ID : /ID
EnterpriseID : /EnterpriseID
ReconDefinitionPK : /ReconDefinitionPK
ExecutionReferenceNumber : /ExecutionReferenceNumber
ReconciliationDate : /ReconciliationDate
#_version_ : /JOBExecutionPK
ignored_JOBExecutionPK : /JOBExecutionPK
ignored_GroupName : "/GroupingDetails[]/name"
ignored_GroupValue : "/GroupingDetails[]/value"
ignored_GroupOrder : "/GroupingDetails[]/order"
ignored_ParticipantID : "/MatchingDetails[]/ParticipantID"
ignored_MatchingName : "/MatchingDetails[]/MatchingDetailValues[]/name"
ignored_MatchingValue : "/MatchingDetails[]/MatchingDetailValues[]/value"
ignored_MatchingOrder : "/MatchingDetails[]/MatchingDetailValues[]/order"
}
}

 

Sample AVRO data

 

{
"ID": "380e8a33-7ac2-4cfb-a815-68537c70f441",
"EnterpriseID": "NRIFT",
"ReconDefinitionPK": 100,
"JOBExecutionPK": 200,
"ExecutionReferenceNumber": "J0001100001",
"ReconciliationDate": "20131227",
"GroupingDetails": [{
"Name": "Account",
"Value": "AC0000001",
"Order": 1
},
{
"Name": "Currency",
"Value": "USD",
"Order": 2
},
{
"Name": "Debit/Credit",
"Value": "C",
"Order": 3
}],
"MatchingDetails": [{
"ParticipantID": "NOSTRO",
"MatchingDetailValues": [{
"Name": "Date",
"Value": "20131228",
"Order": 1
},
{
"Name": "Amount",
"Value": "125",
"Order": 2
}]
},
{
"ParticipantID": "CUSTODY",
"MatchingDetailValues": [{
"Name": "Date",
"Value": "20131228",
"Order": 1
},
{
"Name": "Amount",
"Value": "126",
"Order": 2
}]
}]
}

 

I am generating AVRO file by a btach program and put into HDFS using hadoop put command.

Then I trying to index the AVRO file using IndexerTool.

Highlighted

Re: MapReduceIndexerTool is not indexing AVRO file

Expert Contributor
Try the MapReduceIndexerTool --dry-run command line option and enable TRACE log4j level as shown here:

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#logTrace

If problems persist attach the log output for better diagnostics.