I am a student and it is my first time to post question here.
If my question should be post at other place please give me a suggest.
Thanks all !
Here is my question:
I am doing a research about Hadoop with VM (using VMWare vSphere),
The environment I have is :
A hadoop cluster with 7 VMs, act as 1 NameNode + 6 DataNodes.
Each VM has following virtual haedware: 4 core CPU, 16GB memory, 2.7TB disk space,
and Operating System is CentOS 6.6 64bits.
Before this time, I had testing the same configuration on the other Cluster on different physical hosts with VMWare ESXi.
But this time I faced a problem like following picture:
In this test, like my previous test, I used TestDFSIO benchmark in Hadoop,
I used TestDFSIO to write 1TB files (1000 x 1G files) between VMs,
after starting test about 5~15 minutes, the Network IO speed will slow down from 300~400M/s to about 50M/s.
I had run the test 10 or more times, every test had similar situation like the picture,
sometimes high speed of Network IO will keep 5~15 minutes, after that, it will be slow(only 50M/s).
I had try a test that I replaced DataNodes to other VMs on the other physical hosts( they were normal in my previous test ),
the problem never happened.
I tried to use scp command in CentOS to transmit a file(4.4GB) from NameNode to a DataNode,
and I found that before the file transmitted to 70% (that is, the file has been transmitted about 3GB to other VM),
the speed is keeping about 110M/s,
but after 70% is done, the speed will getting slow, 100M/s... 90M/s... 80M/s... , and finally slow down to 15M/s,
and keeping this speed until the job is done.
I cannot find out the reason of this problem, it is really a strange situation.
Did you meet the same or similar problem before?
Thank you guys
Not sure if you have got any thoughts, But i am also hitting this same behavior. The network admins are not able to find any abnormalities as well.
Please share if you have got any resolution and cause of this issue