Hadoop TestDFSIO - Cluster Network IO getting slow after running about 5~15 minutes

Hello guys,


I am a student and it is my first time to post question here.

If my question should be post at other place please give me a suggest.

Thanks all !



Here is my question:


I am doing a research about Hadoop with VM (using VMWare vSphere),


The environment I have is :

A hadoop cluster with 7 VMs, act as 1 NameNode + 6 DataNodes.

Each VM has following virtual haedware: 4 core CPU, 16GB memory, 2.7TB disk space,

and Operating System is CentOS 6.6 64bits.


Before this time, I had testing the same configuration on the other Cluster on different physical hosts with VMWare ESXi.

But this time I faced a problem like following picture:


內置圖片 1


In this test, like my previous test, I used TestDFSIO benchmark in Hadoop,

I used TestDFSIO to write 1TB files (1000 x 1G files) between VMs,

after starting test about 5~15 minutes, the Network IO speed will slow down from 300~400M/s to about 50M/s.

I had run the test 10 or more times, every test had similar situation like the picture,

sometimes high speed of Network IO will keep 5~15 minutes, after that, it will be slow(only 50M/s).

I had try a test that I replaced DataNodes to other VMs on the other physical hosts( they were normal in my previous test ),

the problem never happened.


I tried to use scp command in CentOS to transmit a file(4.4GB) from NameNode to a DataNode,

and I found that before the file transmitted to 70% (that is, the file has been transmitted about 3GB to other VM),

the speed is keeping about 110M/s,

but after 70% is done, the speed will getting slow,  100M/s... 90M/s... 80M/s... , and finally slow down to 15M/s,

and keeping this speed until the job is done.


I cannot find out the reason of this problem, it is really a strange situation.

Did you meet the same or similar problem before?


Thank you guys :)