Reply
New Contributor
Posts: 4
Registered: ‎05-29-2014

Running Flume agents on Data Nodes - a good/bad idea?

[ Edited ]

Hi,

 

I am using multiple flume agents, and within them a HDFS sink to push data to HDFS. But I am unsure of where to run these agents. 

1) Runnning them on data nodes deprives the task tracker on the data nodes 100% CPU (https://wiki.apache.org/hadoop/DataNode)

2) Running them on a separate machine, takes it away from HDFS.

 

Can someone share proper enterprise practices of deploying the Flume tier closest to HDFS?

Announcements
New solutions