Reply
New Contributor
Posts: 3
Registered: ‎08-02-2015

In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

Hi team,

 

I have created spool directory with these properties but I did't got any ouput form this confguration.

So can any one give me idea about this in cloudera manager.

Here, agent1 in one machine and agent2 in annother machine. First I started agent2 after that I need to fetch data in spool directory in agenet1.

 

In Machine2 Configuration file:

--------------------------------------

### Agent2 - Avro Source and File Channel, Avro Sink
# Name the components on this agent
Agent2.sources = avro-source
Agent2.channels = file-channel
Agent2.sinks = hdfs-sink
###
# Describe/configure Source
Agent2.sources.avro-source.type = avro
Agent2.sources.avro-source.hostname = 192.168.1.206
Agent2.sources.avro-source.port = 7182
# Describe the sink
Agent2.sinks.hdfs-sink.type = hdfs
Agent2.sinks.hdfs-sink.hdfs.path = hdfs://192.168.1.201:8020/user/flume/
Agent2.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent2.sinks.hdfs-sink.hdfs.rollSize = 0
Agent2.sinks.hdfs-sink.hdfs.rollCount = 10000
Agent2.sinks.hdfs-sink.hdfs.fileType = DataStream
#Use a channel which buffers events in file
Agent2.channels.file-channel.type = file
Agent2.channels.file-channel.checkpointDir = /home/hduser/Desktop/testflume/checkpoint
Agent2.channels.file-channel.dataDirs = /home/hduser/Desktop/testflume/data/
# Bind the source and sink to the channel
Agent2.sources.avro-source.channels = file-channel
Agent2.sinks.hdfs-sink.channel = file-channel

 

In Machine1 Configuration file:

------------------------------------

### Agent1 - Spooling Directory Source and File Channel, Avro Sink
# Name the components on this agent
Agent1.sources = spooldir-source
Agent1.channels = file-channel
Agent1.sinks = avro-sink
###
# Describe/configure Source
Agent1.sources.spooldir-source.type = spooldir
Agent1.sources.spooldir-source.spoolDir = /home/hduser/Desktop/testflume/spooldir
# Describe the sink
Agent1.sinks.avro-sink.type = avro
Agent1.sinks.avro-sink.hostname = 192.168.1.206
Agent1.sinks.avro-sink.port = 7182
#IP Address masked here
#Use a channel which buffers events in file
Agent1.channels.file-channel.type = file
Agent1.channels.file-channel.checkpointDir = /home/hduser/Desktop/testflume/checkpoint
Agent1.channels.file-channel.dataDirs = /home/hduser/Desktop/testflume/data/
# Bind the source and sink to the channel
Agent1.sources.spooldir-source.channels = file-channel
Agent1.sinks.avro-sink.channel = file-channel

Cloudera Employee
Posts: 175
Registered: ‎01-09-2014

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

Do you see the files in /home/hduser/Desktop/testflume/spooldir getting marked with a .COMPLETED suffix?

Do you see the channel size growing in the flume metrics in CM?

Can you please include the log files from the startup on both agents?

 

 

-PD

New Contributor
Posts: 3
Registered: ‎08-02-2015

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

Yes, I saw there but I did't get any extension with .complted files in that folder

New Contributor
Posts: 5
Registered: ‎07-23-2014

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

i am trying to implement same pipleline. could you please share the statics you have achieved. what is the trasnfer rate. what was ur input size and time taken to put in sink.

Cloudera Employee
Posts: 175
Registered: ‎01-09-2014

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

The rate is all dependent upon the size of the individual events. There isn't really a calculation that can be made, its more of an observation in an environment where you can perform a load test to see your max throughput rate per avro sink-> source.

-pd
New Contributor
Posts: 5
Registered: ‎07-23-2014

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

How to know which part of entire pipeline is slow in flume agent. To
identify if its at source,sink or channel. Do we have any tool or how can i
get those timings..
Cloudera Employee
Posts: 175
Registered: ‎01-09-2014

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

You can review the Flume Metrics Details in the service page, or use the Charts library to see the rate at which each flume agent is ingesting to the source, or delivering to the sinks.

 

-pd

New Contributor
Posts: 5
Registered: ‎07-23-2014

Re: In Flume I have created multi agent setup for spooling Data sinks with avro/ hdfs

In case of channel multiplexing, is it possible to have default parameter/value/cloumn for selector.header because i dont have one  in my data but i want to use multiple channels and sinks so that performance would improve.

Announcements
New solutions