Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume - Help

Flume - Help

New Contributor

Im working on a use case where i need to stream data from kafka into HDFS.eventually set up hive structures on HDFS.    i have a kafka topic created for each HDFS/Hive table( ~ 30 tables).   from the design point of view, I was trying few options but wanted advise from experts.

 

Option#1 :  single flume agent with all cofig ( src1,channel-1,sink-1, src-2 ... etc...) 

Option#2 :  separate agent for each configuration.   

 

Just wondering, Is there any advantages/constraints one over the other ?  

 

thanks. 

2 REPLIES 2

Re: Flume - Help

Cloudera Employee

Hi,

 

Multiple agents has an advantage of distributing load across multiple nodes.  If the load is very less and one flume agent is able to consume the messages from all topics then it should be fine.

 

But, if the flume is not able to keep up with the speed of ingestion, you can increase flume agents and distribute topics on the agents.  Even for one topic you can createa multiple partitions and have differnt agent cosuming for the same process. So it all depends upon the load and performance constraints.

 

Regards

Bimal

Highlighted

Re: Flume - Help

New Contributor
Thank you