Hi All, I would like to implement a real time data feed between a webserver and hadoop server. I plan to use flume to read the web log files real time and target is hdfs/Hive,
1. I need a checklist of what to prepare for the security like, firewalls etc.
2. Are there any hadoop agent I need to install in the webser server
3. Once data is available now in hive, I will have a regular job to process the data using Impala then once processed I will have a list of suggestions/messages for a particular web user. How do I send the info back to that specific web users web page?