Archives of Support Questions (Read Only)

Eukrev · ‎02-14-2016

Hi,

How can I use Flume to ingest data into HDFS instead of NFS in the below use case? http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/#section_1

For a semi-structured data where data is placed on Omniture weblog directory, would like to confirm on the source type -

would it be spooling directory source or multiport_syslogtcp. which one to use.

Thank you.

divakarreddy_a · ‎02-14-2016

@Revathy Mourouguessane

I hope this link will help for you

https://flume.apache.org/FlumeUserGuide.html

View solution in original post

divakarreddy_a · ‎02-14-2016

@Revathy Mourouguessane

I hope this link will help for you

https://flume.apache.org/FlumeUserGuide.html

aervits · ‎02-14-2016

@Revathy Mourouguessane

spooling dir is good when you want to watch directory for new files. Syslog listens on a port. So if your logs land in a directory, you would use spooling dir. For hdfs you would use hdfs sink. When you master flume, check out Apache NiFi.

Eukrev · ‎02-14-2016

Thank you.

Cloudera Community

Archives of Support Questions (Read Only)

Data ingestion using flume - Visualize website clickstream data