Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Running Flume agents on Data Nodes - a good/bad idea?

Running Flume agents on Data Nodes - a good/bad idea?

New Contributor

Hi,

 

I am using multiple flume agents, and within them a HDFS sink to push data to HDFS. But I am unsure of where to run these agents. 

1) Runnning them on data nodes deprives the task tracker on the data nodes 100% CPU (https://wiki.apache.org/hadoop/DataNode)

2) Running them on a separate machine, takes it away from HDFS.

 

Can someone share proper enterprise practices of deploying the Flume tier closest to HDFS?