Hello,
A cluster is receiving data from only flume and want to calculate throughput for flume writing to HDFS to gauge network, io, storage, etc. for capacity planning for a new environment
How can i calculate this ?
I did some work and please let me know if i am going in right direction.
Datanodes: 8
Each datanode has 12 disks: 2.0T Each
Toal disk space: 24T
8 cores CPU with Hyperthreading(16 cores)
Physical Memory : 62GB per datanode
I see metrics from HDFS: Total bytes written across datanodes(1d): 2.2Mb/sec - Is this correct metric to report ?
I have a question regarding the time duration to select form these charts ?