06-09-2016 05:58 AM
I am new to use Cloudera Search and I am bit confused.
Following is my use case. I want to ingest real time logs in HDFS using Flume. a) want to make search on these logs b) want to use logs which gets ingested for some other purpose.
But when I index it in solrSink (in collection called collection4)
I notice following happens, following directory gets created in HDFS
In this tlog directory there is a file which looks like it contains data , but in some binary form. My query is where does actual data goes (because I want to use it for other purposes) and in this context , and how do indexes created in this solrSink point to actual data
08-09-2016 05:33 AM
The folder you see are the actual data indexed by Solr.
If you want to access these data, you need to request them through Solr using the Solr Rest API.
If you need the "raw" data, then you will need to duplicate the data directly into HDFS yourself using for example a second sink of type "HDFS".