<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Extracthbase cell command does not retain xml tags in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Extracthbase-cell-command-does-not-retain-xml-tags/m-p/11622#M1686</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am inserting an xml &amp;nbsp;into &amp;nbsp;hbase column familiy and indexing it to solr.One of the solr fields is the &amp;nbsp;complete xml and other fields are the vvalues extracted from xml.How ever I am missing the xml tags in the indexed value.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am taking the value out as a string.While writing into hbase &amp;nbsp;I &amp;nbsp;set character encoding as utf-8 and &amp;nbsp;also do the same on my &amp;nbsp;java code.I &amp;nbsp;have to display&amp;nbsp;actualMessage field &amp;nbsp;as solr result(its one of the fields),It is getting displayed &amp;nbsp;but with out xml tags or attribute values.Can you help?.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;{&lt;BR /&gt;extractHBaseCells {&lt;BR /&gt;mappings : [&lt;BR /&gt;{&lt;BR /&gt;inputColumn : "messages:*"&lt;BR /&gt;outputField : "actualMessage"&lt;BR /&gt;type : string&lt;BR /&gt;source : value&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;java {&lt;BR /&gt;imports : "import java.io.*;import javax.xml.parsers.*;import org.w3c.dom.*;"&lt;BR /&gt;code: """&lt;BR /&gt;String s =null;&lt;BR /&gt;byte [] b =null;&lt;BR /&gt;DocumentBuilderFactory docFactory = null;&lt;BR /&gt;DocumentBuilder docBuilder = null;&lt;BR /&gt;Document document = null;&lt;BR /&gt;InputStream is =null;&lt;BR /&gt;try{&lt;BR /&gt;s = (String)record.get("actualMessage").get(0);&lt;BR /&gt;b = s.getBytes("UTF-8");&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 08:58:22 GMT</pubDate>
    <dc:creator>Nishan</dc:creator>
    <dc:date>2022-09-16T08:58:22Z</dc:date>
    <item>
      <title>Extracthbase cell command does not retain xml tags</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Extracthbase-cell-command-does-not-retain-xml-tags/m-p/11622#M1686</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am inserting an xml &amp;nbsp;into &amp;nbsp;hbase column familiy and indexing it to solr.One of the solr fields is the &amp;nbsp;complete xml and other fields are the vvalues extracted from xml.How ever I am missing the xml tags in the indexed value.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am taking the value out as a string.While writing into hbase &amp;nbsp;I &amp;nbsp;set character encoding as utf-8 and &amp;nbsp;also do the same on my &amp;nbsp;java code.I &amp;nbsp;have to display&amp;nbsp;actualMessage field &amp;nbsp;as solr result(its one of the fields),It is getting displayed &amp;nbsp;but with out xml tags or attribute values.Can you help?.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;{&lt;BR /&gt;extractHBaseCells {&lt;BR /&gt;mappings : [&lt;BR /&gt;{&lt;BR /&gt;inputColumn : "messages:*"&lt;BR /&gt;outputField : "actualMessage"&lt;BR /&gt;type : string&lt;BR /&gt;source : value&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;java {&lt;BR /&gt;imports : "import java.io.*;import javax.xml.parsers.*;import org.w3c.dom.*;"&lt;BR /&gt;code: """&lt;BR /&gt;String s =null;&lt;BR /&gt;byte [] b =null;&lt;BR /&gt;DocumentBuilderFactory docFactory = null;&lt;BR /&gt;DocumentBuilder docBuilder = null;&lt;BR /&gt;Document document = null;&lt;BR /&gt;InputStream is =null;&lt;BR /&gt;try{&lt;BR /&gt;s = (String)record.get("actualMessage").get(0);&lt;BR /&gt;b = s.getBytes("UTF-8");&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:58:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Extracthbase-cell-command-does-not-retain-xml-tags/m-p/11622#M1686</guid>
      <dc:creator>Nishan</dc:creator>
      <dc:date>2022-09-16T08:58:22Z</dc:date>
    </item>
    <item>
      <title>Re: Extracthbase cell command does not retain xml tags</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Extracthbase-cell-command-does-not-retain-xml-tags/m-p/11642#M1687</link>
      <description>If indeed the data in HBase contains the XML tags, then it sounds like your tokenizer/analyzer chain in Solr schema.xml is stripping info away, i.e. schema.xml isn?t configured to do what you want it to do.&lt;BR /&gt;&lt;BR /&gt;You could confirm that the morphline is doing what it?s supposed to do by adding some debug log message like this to your morphline:&lt;BR /&gt;&lt;BR /&gt;logInfo { format : "my record: {}", args : ["@{}"] }&lt;BR /&gt;&lt;BR /&gt;Also see &lt;A target="_blank" href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters"&gt;http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters&lt;/A&gt; and &lt;A target="_blank" href="https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr"&gt;https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Wolfgang.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 01 May 2014 20:59:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Extracthbase-cell-command-does-not-retain-xml-tags/m-p/11642#M1687</guid>
      <dc:creator>whosch</dc:creator>
      <dc:date>2014-05-01T20:59:09Z</dc:date>
    </item>
    <item>
      <title>Re: Extracthbase cell command does not retain xml tags</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Extracthbase-cell-command-does-not-retain-xml-tags/m-p/12010#M1688</link>
      <description>&lt;P&gt;Thanks mate.It worked.Thanks a lot for all your help in this&lt;/P&gt;</description>
      <pubDate>Mon, 05 May 2014 17:05:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Extracthbase-cell-command-does-not-retain-xml-tags/m-p/12010#M1688</guid>
      <dc:creator>Nishan</dc:creator>
      <dc:date>2014-05-05T17:05:50Z</dc:date>
    </item>
  </channel>
</rss>

