04-09-2016 04:13 PM
I want to do a poc for How-to: Do Real-Time Log Analytics with Apache Kafka and Cloudera Search,i have a around 200 files of real time log data in 20 different servers,,Below are my questions:-
1)What should be the approach, i mean to pull data from these 20 servers?
2)How can i CONFIGURE Cloudera CDH 5 Virtual Machine to integrate kafka with these log servers to build a map reduce task?
04-11-2016 11:08 PM
doing some quick searching, this blog seems to be doing what I think is your intent, taking logs, storing in kafka, distributing to various consumers, one of those consumers being Cloudera Search (solr)  you could make it simpler and store directly to solr if you aren't planning on consuming the same data from multiple sources. instead of logstash you could also use Flume   as well.