Support Questions

Find answers, ask questions, and share your expertise

Error while batch importing from HBase to Solr with HBaseIndexer

avatar
Guru

Hello,

my HBaseIndexer MR job failed with the following error message:

2017-02-14 10:57:29,676 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1484216395768_0113_r_000000_3: Error: java.io.IOException: Batch Write Failure
	at org.apache.solr.hadoop.BatchWriter.throwIf(BatchWriter.java:239)
	at org.apache.solr.hadoop.BatchWriter.queueBatch(BatchWriter.java:181)
	at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:275)
	at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.solr.common.SolrException: ERROR: [doc=#0;#0;#0;#0;#27;�z] unknown field 'id'
	at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:185)
	at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)

Why does it want to write to a field called 'id' ? since neither the Solr schema nor in the morphline there is a field called 'id'....

Is this a prereq. for the HBaseIndexer MR tool to work ?!?!

 

Thanks in advance...

 

1 ACCEPTED SOLUTION

avatar
Guru

Hi again,

 

just fixed it by adding field 'id' to SORL schema, but didn't find that hint anywhere in the HBaseMapReduceIndexer doc....therefore I was unsure, initially 😉

View solution in original post

2 REPLIES 2

avatar
Guru

Hi again,

 

just fixed it by adding field 'id' to SORL schema, but didn't find that hint anywhere in the HBaseMapReduceIndexer doc....therefore I was unsure, initially 😉

avatar
Super Collaborator

Give us your indexer_def.xml and morphline conf.

 

There should be an "id" field somewhere. And I guess you will find it in the indexer_def.xml file.

For example :

<indexer table="<hbase_table_name>" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper" 
	unique-key-field="id">