Created 02-14-2016 02:40 AM
Hi,
How can I use Flume to ingest data into HDFS instead of NFS in the below use case? http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/#section_1
For a semi-structured data where data is placed on Omniture weblog directory, would like to confirm on the source type -
would it be spooling directory source or multiport_syslogtcp. which one to use.
Thank you.
Created 02-14-2016 03:17 AM
I hope this link will help for you
Created 02-14-2016 03:17 AM
I hope this link will help for you
Created 02-14-2016 12:48 PM
spooling dir is good when you want to watch directory for new files. Syslog listens on a port. So if your logs land in a directory, you would use spooling dir. For hdfs you would use hdfs sink. When you master flume, check out Apache NiFi.
Created 02-14-2016 01:12 PM
Thank you.