Support Questions
Find answers, ask questions, and share your expertise

Running Flume agents on Data Nodes - a good/bad idea?

Running Flume agents on Data Nodes - a good/bad idea?

New Contributor

Hi,

 

I am using multiple flume agents, and within them a HDFS sink to push data to HDFS. But I am unsure of where to run these agents. 

1) Runnning them on data nodes deprives the task tracker on the data nodes 100% CPU (https://wiki.apache.org/hadoop/DataNode)

2) Running them on a separate machine, takes it away from HDFS.

 

Can someone share proper enterprise practices of deploying the Flume tier closest to HDFS?

Don't have an account?