Member since
01-18-2015
5
Posts
0
Kudos Received
0
Solutions
06-12-2015
05:14 AM
Thanks, I was checking the generating of the avro and I had something wrong and the avro objects were empty, just the schema. I fixed it and it seems that I skiped that error. I have used the TRACE level to see what it's happening and reviewed the log for the mapReduce again and I got this error when it tries to index an document to Solr 2015-06-12 05:06:49,843 INFO [IPC Server handler 10 on 45052]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from
attempt_1434101650719_0008_r_000000_3: Error: java.io.IOException: Batch
Write Failure
at org.apache.solr.hadoop.BatchWriter.throwIf(BatchWriter.java:239)
at org.apache.solr.hadoop.BatchWriter.queueBatch(BatchWriter.java:181)
at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:290)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
*Caused by: org.apache.solr.common.SolrException: ERROR: [doc=0Name115457]
unknown field '_attachment_mimetype'*
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:185)
at
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:238)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at ..... I have been reading http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/ where it talks about the field *_attachment_mimetype. * Why is it trying to index this field to Solr? I executed the configuration as well with: hadoop --config /etc/hadoop/conf jar
/usr/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar --conf
/etc/hbase/conf/hbase-site.xml -D 'mapred.child.java.opts=-Xmx500m'
--hbase-indexer-file /home/cloudera/morphline-hbase-mapper.xml --zk-host
127.0.0.1/solr --collection hbase-collection1 --dry-run --log4j
/home/cloudera/log4j.properties And it looks that it works fine. dryRun: SolrInputDocument(fields: [id=4Name249228,
*_attachment_mimetype=[avro/java+memory]*, _attachment_body=[{"name":
"4Name249228", "favorite_number": 41, "favorite_color": "Red27"}],
name=[Red27]])
16366 [main] TRACE
com.ngdata.hbaseindexer.morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells
- beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 05:12:07 TRACE
morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify:
{lifecycle=[START_SESSION]}
16366 [main] TRACE
com.ngdata.hbaseindexer.morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells
- beforeProcess:
{_attachment_body=[keyvalues={4Name341784/data:avroUser/1434105492587/Put/vlen=276/seqid=0}],
_attachment_mimetype=[application/java-hbase-result]}
15/06/12 05:12:07 TRACE
morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess:
{_attachment_body=[keyvalues={4Name341784/data:avroUser/1434105492587/Put/vlen=276/seqid=0}],
_attachment_mimetype=[application/java-hbase-result]}
16368 [main] DEBUG com.ngdata.hbaseindexer.indexer.Indexer$RowBasedIndexer
- Indexer _default_ will send to Solr 1 adds and 0 deletes
15/06/12 05:12:07 DEBUG indexer.Indexer$RowBasedIndexer: Indexer _default_
will send to Solr 1 adds and 0 deletes
dryRun: SolrInputDocument(fields: [id=4Name341784,
_attachment_mimetype=[avro/java+memory], _attachment_body=[{"name":
"4Name341784", "favorite_number": 1, "favorite_color": "Red22"}],
name=[Red22]])
15/06/12 05:12:07 INFO client.ConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x14de71e21730082 What I don't know how I can to say to Solr to avoid the *_attachment_mimetype *and don't index that field. I'll type the next problem about Solr and Lily in Cloudera Search. Thanks.
... View more
06-12-2015
01:39 AM
I'm trying to use an tutorial from Cloudera. (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/search_hbase_batch_indexer.html) I have a code to insert objects in Avro format in HBase and I want to insert them to Solr but I don't get anything. I have been taking a look to the logs: 15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={0Name178721/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}
15/06/12 00:45:00 DEBUG indexer.Indexer$RowBasedIndexer: Indexer _default_ will send to Solr 0 adds and 0 deletes
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={1Name134339/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]} So, I'm reaing them but I don't know why it isn't indexed anything in Solr. I guess that my morphline.conf is wrong. morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**", "com.ngdata.**"]
commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data:avroUser"
outputField : "_attachment_body"
type : "byte[]"
source : value
}
]
}
}
#for avro use with type : "byte[]" in extractHBaseCells mapping above
{ readAvroContainer {} }
{
extractAvroPaths {
flatten : true
paths : {
name : /name
}
}
}
{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
] I wasn't sure if I had to have an "_attachment_body" field in Solr, but it seems that it isn't necessary, so I guess that readAvroContainer or extractAvroPaths are wrong. I have a "name" field in Solr and my avroUser has a "name" field as well. {"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Solr