Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Indexing zip file to Solr with a MapReduce and Morphline

Indexing zip file to Solr with a MapReduce and Morphline

New Contributor

I am trying to index zip file to Solr with a MapReduce and Morphline. The zip file is getting indexed and all files inside zip also getting indexed. The ZIP file and the files inside the zip have seperate unique UUID associated after indexing. While searching the search result only giving the individual file information and NOT giving the parent ZIP file name where it belongs.

 

For example, there is a "xyz.doc" which have text say "test" and this doc file is inside "pqr.zip". After indexing, when I searched with text "test", I get response that "test" found in "xyz.doc" but there is no way to know whether this "xyz.doc" belong to the zip "pqr.zip" or not.

 

Is there any way to find that information, or do something in morphline to enrich the record object of "xyz.doc" with parent ZIP file information whicle "xyz.doc" is getting proccessed