A cluster is receiving data from only flume and want to calculate throughput for flume writing to HDFS to gauge network, io, storage, etc. for capacity planning for a new environment
How can i calculate this ?
I did some work and please let me know if i am going in right direction.
Each datanode has 12 disks: 2.0T Each
Toal disk space: 24T
8 cores CPU with Hyperthreading(16 cores)
Physical Memory : 62GB per datanode
I see metrics from HDFS: Total bytes written across datanodes(1d): 2.2Mb/sec - Is this correct metric to report ?
I have a question regarding the time duration to select form these charts ?
If you are using Cloudera Manager, these metrics can be viewed by clicking on the Flume service from the Cloudera Homepage and then clicking on "Metric Details". This will be populated once you start using Flume to push data.
Also, if you are looking for a more myopic view from a particular agent; you can then click on the agent link and Cloudera Manager will show you the charts that detail the following :
Host Network throughput
You can also build your own custom dashboard with charts that you specify by using the following: