08-18-2014 10:01 PM - edited 08-19-2014 03:51 AM
I am using multiple flume agents, and within them a HDFS sink to push data to HDFS. But I am unsure of where to run these agents.
1) Runnning them on data nodes deprives the task tracker on the data nodes 100% CPU (https://wiki.apache.org/hadoop/DataNode)
2) Running them on a separate machine, takes it away from HDFS.
Can someone share proper enterprise practices of deploying the Flume tier closest to HDFS?