<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/12798#M1844</link>
    <description>To make Solr &amp;amp; XML Parser happy consider removing non-valid characters from input strings. Perhaps plug some sanity fixup logic into a custom morphline command, along similar lines as these:&lt;BR /&gt;&lt;BR /&gt;&lt;A target="_blank" href="https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java/org/kitesdk/morphline/solrcell/StripNonCharSolrContentHandlerFactory.java#L56-71"&gt;https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java/org/kitesdk/morphline/solrcell/StripNonCharSolrContentHandlerFactory.java#L56-71&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Wolfgang.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 27 May 2014 09:06:29 GMT</pubDate>
    <dc:creator>whosch</dc:creator>
    <dc:date>2014-05-27T09:06:29Z</dc:date>
    <item>
      <title>How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/12786#M1843</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We use an hbase-indexer for NRT indexing an hbase table and from time to time cell values cause a&amp;nbsp;CharConversionException when the document is sent to Solr.&lt;/P&gt;&lt;P&gt;As we cannot guarantee 100% error-free data I would like to catch this exception in the mapper for further investigation and drop the value.&amp;nbsp;&lt;SPAN style="line-height: 14px;"&gt;I think a suitable place would be&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="line-height: 14px;"&gt;com.ngdata.hbaseindexer.indexer.Indexer.indexRowData().&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;Is there some configuration option to either make a character conversion issue non-fatal or replace the indexer with a custom class? I tried to replace the mapper but to no avail. As soon as the configuration is active it does not index anymore but without logging any error messages.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;How would you fix such an issue? I thought about patching hbase-indexer-engine and suggest a code improvement but maybe there is an easier way?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;We use CDH 5.0.1. This is the execution stack:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;14/05/26 17:13:11 ERROR impl.SepEventExecutor: Error while processing event&lt;BR /&gt;java.lang.RuntimeException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: [was class java.io.CharConversionException] Invalid UTF-8 character 0xfffe at char #2290, byte #2047)&lt;BR /&gt;at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:87)&lt;BR /&gt;at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97)&lt;BR /&gt;at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:262)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: [was class java.io.CharConversionException] Invalid UTF-8 character 0xfffe at char #2290, byte #2047)&lt;BR /&gt;at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:519)&lt;BR /&gt;at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:207)&lt;BR /&gt;at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:202)&lt;BR /&gt;at org.apache.solr.client.solrj.impl.LBHttpSolrServer.doRequest(LBHttpSolrServer.java:312)&lt;BR /&gt;at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:273)&lt;BR /&gt;at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:310)&lt;BR /&gt;at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)&lt;BR /&gt;at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)&lt;BR /&gt;at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)&lt;BR /&gt;at com.ngdata.hbaseindexer.indexer.DirectSolrInputDocumentWriter.retryAddsIndividually(DirectSolrInputDocumentWriter.java:123)&lt;BR /&gt;at com.ngdata.hbaseindexer.indexer.DirectSolrInputDocumentWriter.add(DirectSolrInputDocumentWriter.java:108)&lt;BR /&gt;at com.ngdata.hbaseindexer.indexer.Indexer.indexRowData(Indexer.java:140)&lt;BR /&gt;at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:84)&lt;BR /&gt;... 6 more&lt;BR /&gt;14/05/26 17:13:11 WARN impl.SepConsumer: Error processing a batch of SEP events, the error will be forwarded to HBase for retry&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;Thanks in advance,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 14px;"&gt;Rolf&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:59:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/12786#M1843</guid>
      <dc:creator>rjakob</dc:creator>
      <dc:date>2022-09-16T08:59:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/12798#M1844</link>
      <description>To make Solr &amp;amp; XML Parser happy consider removing non-valid characters from input strings. Perhaps plug some sanity fixup logic into a custom morphline command, along similar lines as these:&lt;BR /&gt;&lt;BR /&gt;&lt;A target="_blank" href="https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java/org/kitesdk/morphline/solrcell/StripNonCharSolrContentHandlerFactory.java#L56-71"&gt;https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java/org/kitesdk/morphline/solrcell/StripNonCharSolrContentHandlerFactory.java#L56-71&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Wolfgang.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 27 May 2014 09:06:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/12798#M1844</guid>
      <dc:creator>whosch</dc:creator>
      <dc:date>2014-05-27T09:06:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/12886#M1845</link>
      <description>&lt;P&gt;I put that into a new command as suggested and it works like a charm.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 28 May 2014 14:04:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/12886#M1845</guid>
      <dc:creator>rjakob</dc:creator>
      <dc:date>2014-05-28T14:04:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to avoid CharConversionException in HttpSolrServer breaking an hbase-indexer?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/38015#M1846</link>
      <description>&lt;P&gt;How did you make it?Could you describe it in detail,thanks a lot.^_^&lt;/P&gt;</description>
      <pubDate>Sun, 28 Feb 2016 11:24:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-avoid-CharConversionException-in-HttpSolrServer/m-p/38015#M1846</guid>
      <dc:creator>test1990</dc:creator>
      <dc:date>2016-02-28T11:24:55Z</dc:date>
    </item>
  </channel>
</rss>

