Reply
New Contributor
Posts: 4
Registered: ‎01-01-2014

MapReduceIndexerTool is not indexing AVRO file

I am using clodera environemnt to index AVRO file(s)

 

The command I tried :

 

sudo -u hdfs hadoop jar /opt/cloudera/parcels/SOLR-1.1.0-1.cdh4.3.0.p0.21/lib/solr/contrib/mr/search-mr-1.1.0-job.jar org.apache.solr.hadoop.MapReduceIndexerTool --morphline-file /tmp/avro.indexing/morphline.conf --output-dir hdfs://<AVRO indexing path> hdfs://<AVRO input path> --verbose --go-live --zk-host <ZK host URL> --collection automatch_collection --log4j <log file>

 

The main cause I found, The main cause I was able to find, that is morphline.readAvroContainer.numRecords=0

 

I supposed to attache some files.

Is it possible?

Highlighted
Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: MapReduceIndexerTool is not indexing AVRO file

Try the MapReduceIndexerTool --dry-run command line option and enable TRACE log4j level as shown here:

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#logTrace

For concrete advise include as much detailed information as possible about the problem you're seeing, including the log files, the morphline config file and possibly some sample data.

Wolfgang.

New Contributor
Posts: 4
Registered: ‎01-01-2014

Re: MapReduceIndexerTool is not indexing AVRO file

I could not find the file attachment option.

New Contributor
Posts: 4
Registered: ‎01-01-2014

Re: MapReduceIndexerTool is not indexing AVRO file

I am also getting errors like,

 

Error making BlockReader. Closing stale NioInetPeer

java.io.EOFException: Premature EOF: no length prefix available

New Contributor
Posts: 4
Registered: ‎01-01-2014

Re: MapReduceIndexerTool is not indexing AVRO file

Schema.xml

 

<field name="ID" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="EnterpriseID" type="string" indexed="true" stored="true"/>
<field name="ReconDefinitionPK" type="string" indexed="true" stored="true"/>
<field name="ExecutionReferenceNumber" type="string" indexed="true" stored="true"/>
<field name="ReconciliationDate" type="string" indexed="true" stored="true"/>

 

Morphline.conf

 

extractAvroPaths {
flatten : true
paths : {
ID : /ID
EnterpriseID : /EnterpriseID
ReconDefinitionPK : /ReconDefinitionPK
ExecutionReferenceNumber : /ExecutionReferenceNumber
ReconciliationDate : /ReconciliationDate
#_version_ : /JOBExecutionPK
ignored_JOBExecutionPK : /JOBExecutionPK
ignored_GroupName : "/GroupingDetails[]/name"
ignored_GroupValue : "/GroupingDetails[]/value"
ignored_GroupOrder : "/GroupingDetails[]/order"
ignored_ParticipantID : "/MatchingDetails[]/ParticipantID"
ignored_MatchingName : "/MatchingDetails[]/MatchingDetailValues[]/name"
ignored_MatchingValue : "/MatchingDetails[]/MatchingDetailValues[]/value"
ignored_MatchingOrder : "/MatchingDetails[]/MatchingDetailValues[]/order"
}
}

 

Sample AVRO data

 

{
"ID": "380e8a33-7ac2-4cfb-a815-68537c70f441",
"EnterpriseID": "NRIFT",
"ReconDefinitionPK": 100,
"JOBExecutionPK": 200,
"ExecutionReferenceNumber": "J0001100001",
"ReconciliationDate": "20131227",
"GroupingDetails": [{
"Name": "Account",
"Value": "AC0000001",
"Order": 1
},
{
"Name": "Currency",
"Value": "USD",
"Order": 2
},
{
"Name": "Debit/Credit",
"Value": "C",
"Order": 3
}],
"MatchingDetails": [{
"ParticipantID": "NOSTRO",
"MatchingDetailValues": [{
"Name": "Date",
"Value": "20131228",
"Order": 1
},
{
"Name": "Amount",
"Value": "125",
"Order": 2
}]
},
{
"ParticipantID": "CUSTODY",
"MatchingDetailValues": [{
"Name": "Date",
"Value": "20131228",
"Order": 1
},
{
"Name": "Amount",
"Value": "126",
"Order": 2
}]
}]
}

 

I am generating AVRO file by a btach program and put into HDFS using hadoop put command.

Then I trying to index the AVRO file using IndexerTool.

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: MapReduceIndexerTool is not indexing AVRO file

Try the MapReduceIndexerTool --dry-run command line option and enable TRACE log4j level as shown here:

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#logTrace

If problems persist attach the log output for better diagnostics.

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.