Support Questions

Tdas · ‎04-09-2016

I want to do a poc for How-to: Do Real-Time Log Analytics with Apache Kafka and Cloudera Search,i have a around 200 files of real time log data in 20 different servers,,Below are my questions:-

1)What should be the approach, i mean to pull data from these 20 servers?

2)How can i CONFIGURE Cloudera CDH 5 Virtual Machine to integrate kafka with these log servers to build a map reduce task?

ben.hemphill · ‎04-11-2016

doing some quick searching, this blog seems to be doing what I think is your intent, taking logs, storing in kafka, distributing to various consumers, one of those consumers being Cloudera Search (solr) [1] you could make it simpler and store directly to solr[2] if you aren't planning on consuming the same data from multiple sources. instead of logstash you could also use Flume [3] [4] as well.

[1]https://www.elastic.co/blog/logstash-kafka-intro

[2]https://github.com/lucidworks/solrlogmanager

[3]http://www.cloudera.com/documentation/archive/search/1-3-0/Cloudera-Search-User-Guide/csug_flume_sol...

[4]http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

View solution in original post

ben.hemphill · ‎04-11-2016