Created on 05-26-2014 09:19 AM - edited 09-16-2022 01:59 AM
Hi,
We use an hbase-indexer for NRT indexing an hbase table and from time to time cell values cause a CharConversionException when the document is sent to Solr.
As we cannot guarantee 100% error-free data I would like to catch this exception in the mapper for further investigation and drop the value. I think a suitable place would be com.ngdata.hbaseindexer.indexer.Indexer.indexRowData().
Is there some configuration option to either make a character conversion issue non-fatal or replace the indexer with a custom class? I tried to replace the mapper but to no avail. As soon as the configuration is active it does not index anymore but without logging any error messages.
How would you fix such an issue? I thought about patching hbase-indexer-engine and suggest a code improvement but maybe there is an easier way?
We use CDH 5.0.1. This is the execution stack:
14/05/26 17:13:11 ERROR impl.SepEventExecutor: Error while processing event
java.lang.RuntimeException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: [was class java.io.CharConversionException] Invalid UTF-8 character 0xfffe at char #2290, byte #2047)
at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:87)
at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: [was class java.io.CharConversionException] Invalid UTF-8 character 0xfffe at char #2290, byte #2047)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:519)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:207)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:202)
at org.apache.solr.client.solrj.impl.LBHttpSolrServer.doRequest(LBHttpSolrServer.java:312)
at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:273)
at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:310)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at com.ngdata.hbaseindexer.indexer.DirectSolrInputDocumentWriter.retryAddsIndividually(DirectSolrInputDocumentWriter.java:123)
at com.ngdata.hbaseindexer.indexer.DirectSolrInputDocumentWriter.add(DirectSolrInputDocumentWriter.java:108)
at com.ngdata.hbaseindexer.indexer.Indexer.indexRowData(Indexer.java:140)
at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:84)
... 6 more
14/05/26 17:13:11 WARN impl.SepConsumer: Error processing a batch of SEP events, the error will be forwarded to HBase for retry
Thanks in advance,
Rolf
Created 05-27-2014 02:06 AM
Created 05-27-2014 02:06 AM
Created 05-28-2014 07:04 AM
I put that into a new command as suggested and it works like a charm.
thank you.
Created 02-28-2016 03:24 AM
How did you make it?Could you describe it in detail,thanks a lot.^_^