Im working on a use case where i need to stream data from kafka into HDFS.eventually set up hive structures on HDFS. i have a kafka topic created for each HDFS/Hive table( ~ 30 tables). from the design point of view, I was trying few options but wanted advise from experts.
Option#1 : single flume agent with all cofig ( src1,channel-1,sink-1, src-2 ... etc...)
Option#2 : separate agent for each configuration.
Just wondering, Is there any advantages/constraints one over the other ?
Multiple agents has an advantage of distributing load across multiple nodes. If the load is very less and one flume agent is able to consume the messages from all topics then it should be fine.
But, if the flume is not able to keep up with the speed of ingestion, you can increase flume agents and distribute topics on the agents. Even for one topic you can createa multiple partitions and have differnt agent cosuming for the same process. So it all depends upon the load and performance constraints.