Support Questions

Find answers, ask questions, and share your expertise

Flume Tutorials

avatar
Contributor

Are there any good walkthrough tutorials for Flume? I've seen the two listed here. However, after skimming through the second one "Analyzing Social Media and Customer Sentiment," I fail to see any use or reference of Flume within it. I would specifically like something that walks through performance of the two Flume objectives documented in the HDP Certified Developer Exam Objectives sheet:

  1. Given a Flume configuration file, start a Flume agent
  2. Given a configured sink and source, configure a Flume memory channel with a specified capacity https

The 1st tutorial from the link above starts a Flume agent via Ambari, but I assume the Exam will require this to be done via the Terminal.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Daniel Hendrix

I wrote a small tutorial on flume as a how-to rather than use-case based as the ones you specified. I think by far the best resource would be the Flume website. It has examples for every possible sink, source and channel. Please see below

Flume

# HDP 2.3.2 Sandbox
# Example, single-node Flume configuration using netcat source, memory channel and logger sink

# install telnet
yum install -y telnet

# start flume with this configuration
******************************************************************************
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
******************************************************************************
# in another terminal
telnet localhost 44444

# type anything
# then in the original terminal
tail -f /var/log/flume/flume-a1.log

# Exampe netcat source, hdfs sink as DataStream
# create hdfs flume directory
sudo -u hdfs hdfs dfs -mkdir /flume
sudo -u hdfs hdfs dfs -mkdir /flume/events
sudo -u hdfs hdfs dfs -chown -R flume:hdfs /flume/events

******************************************************************************
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
 
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
******************************************************************************

# show the output in hdfs
sudo -u flume hdfs dfs -ls /flume/events/
sudo -u flume hdfs dfs -ls /flume/events/
sudo -u flume hdfs dfs -cat /flume/events/*/*/*/*

View solution in original post

8 REPLIES 8

avatar
Master Mentor

avatar
Contributor

@Neeraj Sabharwal Thanks! I hadn't ran across this one yet.

avatar
Rising Star

Hi Daniel,

Were you able to run this flume example? Because I am trying. What would be the values if the sink is hdfs and not elasticsearch? any idea

Thank you.

avatar
Master Mentor

avatar
Master Mentor
@Daniel Hendrix

I wrote a small tutorial on flume as a how-to rather than use-case based as the ones you specified. I think by far the best resource would be the Flume website. It has examples for every possible sink, source and channel. Please see below

Flume

# HDP 2.3.2 Sandbox
# Example, single-node Flume configuration using netcat source, memory channel and logger sink

# install telnet
yum install -y telnet

# start flume with this configuration
******************************************************************************
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
******************************************************************************
# in another terminal
telnet localhost 44444

# type anything
# then in the original terminal
tail -f /var/log/flume/flume-a1.log

# Exampe netcat source, hdfs sink as DataStream
# create hdfs flume directory
sudo -u hdfs hdfs dfs -mkdir /flume
sudo -u hdfs hdfs dfs -mkdir /flume/events
sudo -u hdfs hdfs dfs -chown -R flume:hdfs /flume/events

******************************************************************************
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
 
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
******************************************************************************

# show the output in hdfs
sudo -u flume hdfs dfs -ls /flume/events/
sudo -u flume hdfs dfs -ls /flume/events/
sudo -u flume hdfs dfs -cat /flume/events/*/*/*/*

avatar
Contributor

@Artem Ervits Thanks, this is very helpful.

avatar
Guru

Hi @Daniel Hendrix

You are correct - you should be able to start a Flume agent from the command line. The docs show how to do this:

https://flume.apache.org/FlumeUserGuide.html#starting-an-agent

You also need to know how to configure a memory channel, which is also demonstrated in the docs:

https://flume.apache.org/FlumeUserGuide.html#memory-channel

I would recommend going through the same tutorial that you found here:

http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-server-log-data/

Within that tutorial, configure a memory channel and try starting it from the command line. Let me know if you have any issues along the way and I'll be glad to assist.

Thanks,

Rich Raposa

Certification Manager

avatar
Master Mentor

@rich This is perfect for an article..Thanks , Rich! @Daniel Hendrix