Reply
Highlighted
New Contributor
Posts: 4
Registered: ‎05-02-2016

Indexing zip file to Solr with a MapReduce and Morphline

[ Edited ]

I am trying to index zip file to Solr with a MapReduce and Morphline. The zip file is getting indexed and all files inside zip also getting indexed. The ZIP file and the files inside the zip have seperate unique UUID associated after indexing. While searching the search result only giving the individual file information and NOT giving the parent ZIP file name where it belongs.

 

For example, there is a "xyz.doc" which have text say "test" and this doc file is inside "pqr.zip". After indexing, when I searched with text "test", I get response that "test" found in "xyz.doc" but there is no way to know whether this "xyz.doc" belong to the zip "pqr.zip" or not.

 

Is there any way to find that information, or do something in morphline to enrich the record object of "xyz.doc" with parent ZIP file information whicle "xyz.doc" is getting proccessed 

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.