Reply
New Contributor
Posts: 5
Registered: ‎10-17-2018

Real time campaign

[ Edited ]

Hi All, I would like to implement a real time data feed between a webserver and hadoop server. I plan to use flume to read the web log files real time and target is hdfs/Hive, 

 

Questions are:

 

1. I need a checklist of what to prepare for the security like, firewalls etc.

2. Are there any hadoop agent I need to install in the webser server

3. Once data is available now in hive, I will have a regular job to process the data using Impala then once processed I will have a list of suggestions/messages for a particular web user. How do I send the info back to that specific web users web page?

 

Thank you

Highlighted
Cloudera Employee
Posts: 40
Registered: ‎01-07-2019

Re: Real time campaign

This question is a bit broad, and simultaneously quite dependent on your exact situation.

I therefore recommend you to contact your cloudera contact person for a more in-depth answer. However, what I can say is the following:

Regarding your second question there is a nice answer here: https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-without-agents-on-web-server/m-p/...

In short, you will want 'something' to push the data off the webserver, (for instance a flume, or a MiNiFy agent) assuming your webserver does not already publish the mesages to a bus like Kafka.

In general the solution that you use for moving data from the webserver to the cluster should also work in the opposite direction.