- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Real time campaign
- Labels:
-
Apache Flume
-
Apache Hive
-
Apache Impala
-
HDFS
Created on ‎12-09-2018 11:26 PM - edited ‎09-16-2022 06:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All, I would like to implement a real time data feed between a webserver and hadoop server. I plan to use flume to read the web log files real time and target is hdfs/Hive,
Questions are:
1. I need a checklist of what to prepare for the security like, firewalls etc.
2. Are there any hadoop agent I need to install in the webser server
3. Once data is available now in hive, I will have a regular job to process the data using Impala then once processed I will have a list of suggestions/messages for a particular web user. How do I send the info back to that specific web users web page?
Thank you
Created ‎04-09-2019 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I therefore recommend you to contact your cloudera contact person for a more in-depth answer. However, what I can say is the following:
Regarding your second question there is a nice answer here: https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-without-agents-on-web-server/m-p/...
In short, you will want 'something' to push the data off the webserver, (for instance a flume, or a MiNiFy agent) assuming your webserver does not already publish the mesages to a bus like Kafka.
In general the solution that you use for moving data from the webserver to the cluster should also work in the opposite direction.
- Dennis Jaheruddin
If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.
Created ‎04-09-2019 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I therefore recommend you to contact your cloudera contact person for a more in-depth answer. However, what I can say is the following:
Regarding your second question there is a nice answer here: https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-without-agents-on-web-server/m-p/...
In short, you will want 'something' to push the data off the webserver, (for instance a flume, or a MiNiFy agent) assuming your webserver does not already publish the mesages to a bus like Kafka.
In general the solution that you use for moving data from the webserver to the cluster should also work in the opposite direction.
- Dennis Jaheruddin
If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.
Created ‎02-27-2020 11:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
Implemented Flume to fetch the weblogs that was transferred to the Hadoop edge server upto HDFS.
Also, due to firewall challenges and security implementations and lack of test environment, used an alternative solution of using Zena job scheduler to transfer the log files from ATM machines and mobile web app logs to hadoop edge server.
Kafka came as a big challenge since we are using LDAP thus security and authentication issues quickly cropped up.
Kudus to your suggestion!
