Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Configure Cloudera CDH 5 for Real time Web logging dashboard analytics in HUE

SOLVED Go to solution

Configure Cloudera CDH 5 for Real time Web logging dashboard analytics in HUE

Contributor

I want to do a poc for How-to: Do Real-Time Log Analytics with Apache Kafka and Cloudera Search,i have a around 200 files of real time log data in 20 different servers,,Below are my questions:-

 

1)What should be the approach, i mean to pull data from these 20 servers?


2)How can i CONFIGURE Cloudera CDH 5 Virtual Machine to integrate kafka with these log servers to build a map reduce task?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Configure Cloudera CDH 5 for Real time Web logging dashboard analytics in HUE

Rising Star

doing some quick searching, this blog seems to be doing what I think is your intent, taking logs, storing in kafka, distributing to various consumers, one of those consumers being Cloudera Search (solr) [1]  you could make it simpler and store directly to solr[2] if you aren't planning on consuming the same data from multiple sources.  instead of logstash you could also use Flume [3] [4] as well. 

 

[1]https://www.elastic.co/blog/logstash-kafka-intro

[2]https://github.com/lucidworks/solrlogmanager

[3]http://www.cloudera.com/documentation/archive/search/1-3-0/Cloudera-Search-User-Guide/csug_flume_sol...

[4]http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

1 REPLY 1
Highlighted

Re: Configure Cloudera CDH 5 for Real time Web logging dashboard analytics in HUE

Rising Star

doing some quick searching, this blog seems to be doing what I think is your intent, taking logs, storing in kafka, distributing to various consumers, one of those consumers being Cloudera Search (solr) [1]  you could make it simpler and store directly to solr[2] if you aren't planning on consuming the same data from multiple sources.  instead of logstash you could also use Flume [3] [4] as well. 

 

[1]https://www.elastic.co/blog/logstash-kafka-intro

[2]https://github.com/lucidworks/solrlogmanager

[3]http://www.cloudera.com/documentation/archive/search/1-3-0/Cloudera-Search-User-Guide/csug_flume_sol...

[4]http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/