Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Solr kite-morphlines-hadoop-sequencefile sample example

Highlighted

Solr kite-morphlines-hadoop-sequencefile sample example

Explorer

Hi,

 

I have got millions of small xml's (in KBs) stored in zipped file at S3. I need to create a solution to provide a search capability over it so that if a request comes up for a certain tag text then all xmls(in whole) containing that text should be returned.

 

I am thinking to create a sequence file for the same to avoid small files problem which will be indexed using Solr. 

 

Questions:

1. I read there is "kite-morphlines-hadoop-sequencefile" morphline command that can be used to read and index it into Solr. Can anyone share any working example of the same ?

 

2. Also, can anyone share any insight on what should I do to send the whole xml/xmls based on search criteria back to the requestor given the fact that it will be in a sequence file.

 

Many Thanks