- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Data ingestion using flume - Visualize website clickstream data
- Labels:
-
Apache Flume
Created ‎02-14-2016 02:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
How can I use Flume to ingest data into HDFS instead of NFS in the below use case? http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/#section_1
For a semi-structured data where data is placed on Omniture weblog directory, would like to confirm on the source type -
would it be spooling directory source or multiport_syslogtcp. which one to use.
Thank you.
Created ‎02-14-2016 03:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I hope this link will help for you
Created ‎02-14-2016 03:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I hope this link will help for you
Created ‎02-14-2016 12:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
spooling dir is good when you want to watch directory for new files. Syslog listens on a port. So if your logs land in a directory, you would use spooling dir. For hdfs you would use hdfs sink. When you master flume, check out Apache NiFi.
Created ‎02-14-2016 01:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.
