Reply
Explorer
Posts: 11
Registered: ‎06-09-2016

Where does actual data go when data is ingested in solrSink using Flume

Hi

I am new to use Cloudera Search and I am bit confused.

 

Following is my use case. I  want to ingest real time logs in HDFS using Flume. a) want to make search on these logs b) want to use logs which gets ingested for some other purpose.

 

But when I index it in solrSink (in collection called collection4)

I notice following happens, following directory gets created in HDFS

/solr/collection4/core_node1/data/index

/solr/collection4/core_node1/data/tlog

 

In this tlog directory there is a file which looks like it contains data , but in some binary form. My query is where does actual data goes (because I want to use it for other purposes) and in this context , and how do indexes created in this solrSink point to actual data

 

Thanks

Aniruddh

Posts: 177
Topics: 8
Kudos: 28
Solutions: 19
Registered: ‎07-16-2015

Re: Where does actual data go when data is ingested in solrSink using Flume

The folder you see are the actual data indexed by Solr.

If you want to access these data, you need to request them through Solr using the Solr Rest API.

 

If you need the "raw" data, then you will need to duplicate the data directly into HDFS yourself using for example a second sink of type "HDFS".