Created 02-05-2016 01:27 PM
Are there any good walkthrough tutorials for Flume? I've seen the two listed here. However, after skimming through the second one "Analyzing Social Media and Customer Sentiment," I fail to see any use or reference of Flume within it. I would specifically like something that walks through performance of the two Flume objectives documented in the HDP Certified Developer Exam Objectives sheet:
The 1st tutorial from the link above starts a Flume agent via Ambari, but I assume the Exam will require this to be done via the Terminal.
Created 02-05-2016 01:31 PM
I wrote a small tutorial on flume as a how-to rather than use-case based as the ones you specified. I think by far the best resource would be the Flume website. It has examples for every possible sink, source and channel. Please see below
Flume # HDP 2.3.2 Sandbox # Example, single-node Flume configuration using netcat source, memory channel and logger sink # install telnet yum install -y telnet # start flume with this configuration ****************************************************************************** # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ****************************************************************************** # in another terminal telnet localhost 44444 # type anything # then in the original terminal tail -f /var/log/flume/flume-a1.log # Exampe netcat source, hdfs sink as DataStream # create hdfs flume directory sudo -u hdfs hdfs dfs -mkdir /flume sudo -u hdfs hdfs dfs -mkdir /flume/events sudo -u hdfs hdfs dfs -chown -R flume:hdfs /flume/events ****************************************************************************** # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute a1.sinks.k1.hdfs.useLocalTimeStamp = true a1.sinks.k1.hdfs.fileType = DataStream # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ****************************************************************************** # show the output in hdfs sudo -u flume hdfs dfs -ls /flume/events/ sudo -u flume hdfs dfs -ls /flume/events/ sudo -u flume hdfs dfs -cat /flume/events/*/*/*/*
Created 02-05-2016 01:28 PM
This is good tutorial http://hortonworks.com/blog/configure-elastic-search-hadoop-hdp-2-0/ "flume example"
Created 02-05-2016 01:41 PM
@Neeraj Sabharwal Thanks! I hadn't ran across this one yet.
Created 04-10-2016 03:18 PM
Hi Daniel,
Were you able to run this flume example? Because I am trying. What would be the values if the sink is hdfs and not elasticsearch? any idea
Thank you.
Created 02-05-2016 01:29 PM
Created 02-05-2016 01:31 PM
I wrote a small tutorial on flume as a how-to rather than use-case based as the ones you specified. I think by far the best resource would be the Flume website. It has examples for every possible sink, source and channel. Please see below
Flume # HDP 2.3.2 Sandbox # Example, single-node Flume configuration using netcat source, memory channel and logger sink # install telnet yum install -y telnet # start flume with this configuration ****************************************************************************** # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ****************************************************************************** # in another terminal telnet localhost 44444 # type anything # then in the original terminal tail -f /var/log/flume/flume-a1.log # Exampe netcat source, hdfs sink as DataStream # create hdfs flume directory sudo -u hdfs hdfs dfs -mkdir /flume sudo -u hdfs hdfs dfs -mkdir /flume/events sudo -u hdfs hdfs dfs -chown -R flume:hdfs /flume/events ****************************************************************************** # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute a1.sinks.k1.hdfs.useLocalTimeStamp = true a1.sinks.k1.hdfs.fileType = DataStream # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ****************************************************************************** # show the output in hdfs sudo -u flume hdfs dfs -ls /flume/events/ sudo -u flume hdfs dfs -ls /flume/events/ sudo -u flume hdfs dfs -cat /flume/events/*/*/*/*
Created 02-05-2016 01:42 PM
@Artem Ervits Thanks, this is very helpful.
Created 02-05-2016 02:16 PM
You are correct - you should be able to start a Flume agent from the command line. The docs show how to do this:
https://flume.apache.org/FlumeUserGuide.html#starting-an-agent
You also need to know how to configure a memory channel, which is also demonstrated in the docs:
https://flume.apache.org/FlumeUserGuide.html#memory-channel
I would recommend going through the same tutorial that you found here:
http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-server-log-data/
Within that tutorial, configure a memory channel and try starting it from the command line. Let me know if you have any issues along the way and I'll be glad to assist.
Thanks,
Rich Raposa
Certification Manager
Created 02-05-2016 02:52 PM
@rich This is perfect for an article..Thanks , Rich! @Daniel Hendrix