I ran DFSIO test on a newly created 4-node basic cluster(all 4 nodes are datanodes and all master services are co-located). Below is the output. I noticed the throughput and avg IO rate for "write" are very low in comparison to "read". What could be the possible reasons and is there any way to improve them?
----- TestDFSIO ----- : read Date & time: Wed Jan 18 16:50:33 PST 2017 Number of files: 10 Total MBytes processed: 1000000.0 Throughput mb/sec: 82.24845564131054 Average IO rate mb/sec: 89.47832489013672 IO rate std deviation: 26.504227991353375 Test exec time sec: 1569.043 ----- TestDFSIO ----- : write Date & time: Wed Jan 18 16:24:20 PST 2017 Number of files: 10 Total MBytes processed: 1000000.0 Throughput mb/sec: 12.744570863790308 Average IO rate mb/sec: 12.745065689086914 IO rate std deviation: 0.07952523184580733 Test exec time sec: 7922.071
Can you check if you have these setup these on sysctl
net.core.somaxconn = 4096 & net.ipv4.tcp_fin_timeout = 10. Also we have some suggested values for dirty ratio (50) and background ratio (20).
You can also run other performance test on local disk instead of HDFS via sysbench / FIO.