Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hadoop TestDFSIO - Cluster Network IO getting slow after running about 5~15 minutes

Highlighted

Hadoop TestDFSIO - Cluster Network IO getting slow after running about 5~15 minutes

New Contributor

Hello guys,

 

I am a student and it is my first time to post question here.

If my question should be post at other place please give me a suggest.

Thanks all !

 

--------------------------------------------

Here is my question:

 

I am doing a research about Hadoop with VM (using VMWare vSphere),

 

The environment I have is :

A hadoop cluster with 7 VMs, act as 1 NameNode + 6 DataNodes.

Each VM has following virtual haedware: 4 core CPU, 16GB memory, 2.7TB disk space,

and Operating System is CentOS 6.6 64bits.

 

Before this time, I had testing the same configuration on the other Cluster on different physical hosts with VMWare ESXi.

But this time I faced a problem like following picture:

 

內置圖片 1

 

In this test, like my previous test, I used TestDFSIO benchmark in Hadoop,

I used TestDFSIO to write 1TB files (1000 x 1G files) between VMs,

after starting test about 5~15 minutes, the Network IO speed will slow down from 300~400M/s to about 50M/s.

I had run the test 10 or more times, every test had similar situation like the picture,

sometimes high speed of Network IO will keep 5~15 minutes, after that, it will be slow(only 50M/s).

I had try a test that I replaced DataNodes to other VMs on the other physical hosts( they were normal in my previous test ),

the problem never happened.

 

I tried to use scp command in CentOS to transmit a file(4.4GB) from NameNode to a DataNode,

and I found that before the file transmitted to 70% (that is, the file has been transmitted about 3GB to other VM),

the speed is keeping about 110M/s,

but after 70% is done, the speed will getting slow,  100M/s... 90M/s... 80M/s... , and finally slow down to 15M/s,

and keeping this speed until the job is done.

 

I cannot find out the reason of this problem, it is really a strange situation.

Did you meet the same or similar problem before?

 

Thank you guys :)

 

 

Tony