<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How do I query and view content including TAGS of an indexed XML file in Solr while using Doc Crawler in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95561#M58920</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/32/paul.html" nodeid="32"&gt;@Paul Codding&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/397/ppruski.html" nodeid="397"&gt;@Piotr Pruski&lt;/A&gt; any ideas on this?&lt;/P&gt;</description>
    <pubDate>Fri, 16 Oct 2015 23:47:39 GMT</pubDate>
    <dc:creator>abajwa</dc:creator>
    <dc:date>2015-10-16T23:47:39Z</dc:date>
    <item>
      <title>How do I query and view content including TAGS of an indexed XML file in Solr while using Doc Crawler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95560#M58919</link>
      <description>&lt;P&gt;I indexed one of my pom.XML files in SOLR 5.2 on HDP 2.3 sandbox; After installing  Doc Crawler and using it to search the pom.xml content, it successfully retrieved the proper XML document, however, DocCrawler stripped out all the TAGS associated with the XML file. Is there a configuration or custom parser that needs referencing to search and VIEW all content including the XML TAGS using Document Crawler? &lt;/P&gt;</description>
      <pubDate>Fri, 16 Oct 2015 19:26:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95560#M58919</guid>
      <dc:creator>adaher</dc:creator>
      <dc:date>2015-10-16T19:26:38Z</dc:date>
    </item>
    <item>
      <title>Re: How do I query and view content including TAGS of an indexed XML file in Solr while using Doc Crawler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95561#M58920</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/32/paul.html" nodeid="32"&gt;@Paul Codding&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/397/ppruski.html" nodeid="397"&gt;@Piotr Pruski&lt;/A&gt; any ideas on this?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Oct 2015 23:47:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95561#M58920</guid>
      <dc:creator>abajwa</dc:creator>
      <dc:date>2015-10-16T23:47:39Z</dc:date>
    </item>
    <item>
      <title>Re: How do I query and view content including TAGS of an indexed XML file in Solr while using Doc Crawler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95562#M58921</link>
      <description>&lt;P&gt;If you want to keep the XML tags then you should be indexing the document without using XML update handlers and just index everything as raw/plain text.&lt;/P&gt;&lt;P&gt;Modify schema xml file and cleanup the fields. Create some literal fields for metadata information about the file and just index the entire XML as a multi valued field. Honestly, this will make the search itself very poor as tags and words will be tokenized so to improve search and optimization add stop-words around tags to improve it.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Oct 2015 20:39:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95562#M58921</guid>
      <dc:creator>acesir</dc:creator>
      <dc:date>2015-10-19T20:39:46Z</dc:date>
    </item>
    <item>
      <title>Re: How do I query and view content including TAGS of an indexed XML file in Solr while using Doc Crawler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95563#M58922</link>
      <description>&lt;P&gt;I agree regarding the poor search. However customer asked how to search based on tags. A lot of their XML docs are very complex, so for the purposes of a demo I did, I converted the xmls to PDF. and all worked fine. I am not sure if that the best solution, but at least one way.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Oct 2015 05:16:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-query-and-view-content-including-TAGS-of-an-indexed/m-p/95563#M58922</guid>
      <dc:creator>adaher</dc:creator>
      <dc:date>2015-10-23T05:16:02Z</dc:date>
    </item>
  </channel>
</rss>

