Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Data ingestion using flume - Visualize website clickstream data

Explorer

Hi,

How can I use Flume to ingest data into HDFS instead of NFS in the below use case? http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/#section_1

For a semi-structured data where data is placed on Omniture weblog directory, would like to confirm on the source type -

would it be spooling directory source or multiport_syslogtcp. which one to use.

Thank you.

1 ACCEPTED SOLUTION

3 REPLIES 3

Mentor
@Revathy Mourouguessane

spooling dir is good when you want to watch directory for new files. Syslog listens on a port. So if your logs land in a directory, you would use spooling dir. For hdfs you would use hdfs sink. When you master flume, check out Apache NiFi.

Explorer

Thank you.