Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?

Solved Go to solution
Highlighted

How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?

New Contributor

Hi,

 

We use an hbase-indexer for NRT indexing an hbase table and from time to time cell values cause a CharConversionException when the document is sent to Solr.

As we cannot guarantee 100% error-free data I would like to catch this exception in the mapper for further investigation and drop the value. I think a suitable place would be com.ngdata.hbaseindexer.indexer.Indexer.indexRowData().

Is there some configuration option to either make a character conversion issue non-fatal or replace the indexer with a custom class? I tried to replace the mapper but to no avail. As soon as the configuration is active it does not index anymore but without logging any error messages.

 

How would you fix such an issue? I thought about patching hbase-indexer-engine and suggest a code improvement but maybe there is an easier way?

 

We use CDH 5.0.1. This is the execution stack:

 

14/05/26 17:13:11 ERROR impl.SepEventExecutor: Error while processing event
java.lang.RuntimeException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: [was class java.io.CharConversionException] Invalid UTF-8 character 0xfffe at char #2290, byte #2047)
at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:87)
at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: [was class java.io.CharConversionException] Invalid UTF-8 character 0xfffe at char #2290, byte #2047)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:519)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:207)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:202)
at org.apache.solr.client.solrj.impl.LBHttpSolrServer.doRequest(LBHttpSolrServer.java:312)
at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:273)
at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:310)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at com.ngdata.hbaseindexer.indexer.DirectSolrInputDocumentWriter.retryAddsIndividually(DirectSolrInputDocumentWriter.java:123)
at com.ngdata.hbaseindexer.indexer.DirectSolrInputDocumentWriter.add(DirectSolrInputDocumentWriter.java:108)
at com.ngdata.hbaseindexer.indexer.Indexer.indexRowData(Indexer.java:140)
at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:84)
... 6 more
14/05/26 17:13:11 WARN impl.SepConsumer: Error processing a batch of SEP events, the error will be forwarded to HBase for retry

 

 

Thanks in advance,

Rolf

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?

Expert Contributor
To make Solr & XML Parser happy consider removing non-valid characters from input strings. Perhaps plug some sanity fixup logic into a custom morphline command, along similar lines as these:

https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java...

Wolfgang.


View solution in original post

3 REPLIES 3
Highlighted

Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?

Expert Contributor
To make Solr & XML Parser happy consider removing non-valid characters from input strings. Perhaps plug some sanity fixup logic into a custom morphline command, along similar lines as these:

https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java...

Wolfgang.


View solution in original post

Highlighted

Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?

New Contributor

I put that into a new command as suggested and it works like a charm.

 

thank you.

Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?

New Contributor

How did you make it?Could you describe it in detail,thanks a lot.^_^

Don't have an account?
Coming from Hortonworks? Activate your account here