Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Data ingestion using flume - Visualize website clickstream data

avatar
Rising Star

Hi,

How can I use Flume to ingest data into HDFS instead of NFS in the below use case? http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/#section_1

For a semi-structured data where data is placed on Omniture weblog directory, would like to confirm on the source type -

would it be spooling directory source or multiport_syslogtcp. which one to use.

Thank you.

1 ACCEPTED SOLUTION

avatar
3 REPLIES 3

avatar

avatar
Master Mentor
@Revathy Mourouguessane

spooling dir is good when you want to watch directory for new files. Syslog listens on a port. So if your logs land in a directory, you would use spooling dir. For hdfs you would use hdfs sink. When you master flume, check out Apache NiFi.

avatar
Rising Star

Thank you.