Created on 04-09-2016 04:13 PM - edited 09-16-2022 03:12 AM
I want to do a poc for How-to: Do Real-Time Log Analytics with Apache Kafka and Cloudera Search,i have a around 200 files of real time log data in 20 different servers,,Below are my questions:-
1)What should be the approach, i mean to pull data from these 20 servers?
2)How can i CONFIGURE Cloudera CDH 5 Virtual Machine to integrate kafka with these log servers to build a map reduce task?
Created 04-11-2016 11:08 PM
doing some quick searching, this blog seems to be doing what I think is your intent, taking logs, storing in kafka, distributing to various consumers, one of those consumers being Cloudera Search (solr) [1] you could make it simpler and store directly to solr[2] if you aren't planning on consuming the same data from multiple sources. instead of logstash you could also use Flume [3] [4] as well.
[1]https://www.elastic.co/blog/logstash-kafka-intro
[2]https://github.com/lucidworks/solrlogmanager
[4]http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/
Created 04-11-2016 11:08 PM
doing some quick searching, this blog seems to be doing what I think is your intent, taking logs, storing in kafka, distributing to various consumers, one of those consumers being Cloudera Search (solr) [1] you could make it simpler and store directly to solr[2] if you aren't planning on consuming the same data from multiple sources. instead of logstash you could also use Flume [3] [4] as well.
[1]https://www.elastic.co/blog/logstash-kafka-intro
[2]https://github.com/lucidworks/solrlogmanager
[4]http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/