Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

Highlighted

In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

New Contributor

Hi team,

 

I have created spool directory with these properties but I did't got any ouput form this confguration.

So can any one give me idea about this in cloudera manager.

Here, agent1 in one machine and agent2 in annother machine. First I started agent2 after that I need to fetch data in spool directory in agenet1.

 

In Machine2 Configuration file:

--------------------------------------

### Agent2 - Avro Source and File Channel, Avro Sink
# Name the components on this agent
Agent2.sources = avro-source
Agent2.channels = file-channel
Agent2.sinks = hdfs-sink
###
# Describe/configure Source
Agent2.sources.avro-source.type = avro
Agent2.sources.avro-source.hostname = 192.168.1.206
Agent2.sources.avro-source.port = 7182
# Describe the sink
Agent2.sinks.hdfs-sink.type = hdfs
Agent2.sinks.hdfs-sink.hdfs.path = hdfs://192.168.1.201:8020/user/flume/
Agent2.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent2.sinks.hdfs-sink.hdfs.rollSize = 0
Agent2.sinks.hdfs-sink.hdfs.rollCount = 10000
Agent2.sinks.hdfs-sink.hdfs.fileType = DataStream
#Use a channel which buffers events in file
Agent2.channels.file-channel.type = file
Agent2.channels.file-channel.checkpointDir = /home/hduser/Desktop/testflume/checkpoint
Agent2.channels.file-channel.dataDirs = /home/hduser/Desktop/testflume/data/
# Bind the source and sink to the channel
Agent2.sources.avro-source.channels = file-channel
Agent2.sinks.hdfs-sink.channel = file-channel

 

In Machine1 Configuration file:

------------------------------------

### Agent1 - Spooling Directory Source and File Channel, Avro Sink
# Name the components on this agent
Agent1.sources = spooldir-source
Agent1.channels = file-channel
Agent1.sinks = avro-sink
###
# Describe/configure Source
Agent1.sources.spooldir-source.type = spooldir
Agent1.sources.spooldir-source.spoolDir = /home/hduser/Desktop/testflume/spooldir
# Describe the sink
Agent1.sinks.avro-sink.type = avro
Agent1.sinks.avro-sink.hostname = 192.168.1.206
Agent1.sinks.avro-sink.port = 7182
#IP Address masked here
#Use a channel which buffers events in file
Agent1.channels.file-channel.type = file
Agent1.channels.file-channel.checkpointDir = /home/hduser/Desktop/testflume/checkpoint
Agent1.channels.file-channel.dataDirs = /home/hduser/Desktop/testflume/data/
# Bind the source and sink to the channel
Agent1.sources.spooldir-source.channels = file-channel
Agent1.sinks.avro-sink.channel = file-channel

7 REPLIES 7
Highlighted

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

Super Collaborator

Do you see the files in /home/hduser/Desktop/testflume/spooldir getting marked with a .COMPLETED suffix?

Do you see the channel size growing in the flume metrics in CM?

Can you please include the log files from the startup on both agents?

 

 

-PD

Highlighted

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

New Contributor

Yes, I saw there but I did't get any extension with .complted files in that folder

Highlighted

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

New Contributor

i am trying to implement same pipleline. could you please share the statics you have achieved. what is the trasnfer rate. what was ur input size and time taken to put in sink.

Highlighted

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

Super Collaborator
The rate is all dependent upon the size of the individual events. There isn't really a calculation that can be made, its more of an observation in an environment where you can perform a load test to see your max throughput rate per avro sink-> source.

-pd

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

New Contributor
How to know which part of entire pipeline is slow in flume agent. To
identify if its at source,sink or channel. Do we have any tool or how can i
get those timings..
Highlighted

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

Super Collaborator

You can review the Flume Metrics Details in the service page, or use the Charts library to see the rate at which each flume agent is ingesting to the source, or delivering to the sinks.

 

-pd

Highlighted

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

New Contributor

In case of channel multiplexing, is it possible to have default parameter/value/cloumn for selector.header because i dont have one  in my data but i want to use multiple channels and sinks so that performance would improve.

Don't have an account?
Coming from Hortonworks? Activate your account here