New Contributor
Posts: 4
Registered: ‎05-29-2014

Running Flume agents on Data Nodes - a good/bad idea?

[ Edited ]



I am using multiple flume agents, and within them a HDFS sink to push data to HDFS. But I am unsure of where to run these agents. 

1) Runnning them on data nodes deprives the task tracker on the data nodes 100% CPU (

2) Running them on a separate machine, takes it away from HDFS.


Can someone share proper enterprise practices of deploying the Flume tier closest to HDFS?

New solutions