Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Data ingestion using flume - Visualize website clickstream data

avatar
Rising Star

Hi,

How can I use Flume to ingest data into HDFS instead of NFS in the below use case? http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/#section_1

For a semi-structured data where data is placed on Omniture weblog directory, would like to confirm on the source type -

would it be spooling directory source or multiport_syslogtcp. which one to use.

Thank you.

1 ACCEPTED SOLUTION

avatar
3 REPLIES 3

avatar

avatar
Master Mentor
@Revathy Mourouguessane

spooling dir is good when you want to watch directory for new files. Syslog listens on a port. So if your logs land in a directory, you would use spooling dir. For hdfs you would use hdfs sink. When you master flume, check out Apache NiFi.

avatar
Rising Star

Thank you.