Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Hadoop TestDFSIO - Cluster Network IO getting slow after running about 5~15 minutes

Hadoop TestDFSIO - Cluster Network IO getting slow after running about 5~15 minutes

New Contributor

Hello guys,

 

I am a student and it is my first time to post question here.

If my question should be post at other place please give me a suggest.

Thanks all !

 

--------------------------------------------

Here is my question:

 

I am doing a research about Hadoop with VM (using VMWare vSphere),

 

The environment I have is :

A hadoop cluster with 7 VMs, act as 1 NameNode + 6 DataNodes.

Each VM has following virtual haedware: 4 core CPU, 16GB memory, 2.7TB disk space,

and Operating System is CentOS 6.6 64bits.

 

Before this time, I had testing the same configuration on the other Cluster on different physical hosts with VMWare ESXi.

But this time I faced a problem like following picture:

 

內置圖片 1

 

In this test, like my previous test, I used TestDFSIO benchmark in Hadoop,

I used TestDFSIO to write 1TB files (1000 x 1G files) between VMs,

after starting test about 5~15 minutes, the Network IO speed will slow down from 300~400M/s to about 50M/s.

I had run the test 10 or more times, every test had similar situation like the picture,

sometimes high speed of Network IO will keep 5~15 minutes, after that, it will be slow(only 50M/s).

I had try a test that I replaced DataNodes to other VMs on the other physical hosts( they were normal in my previous test ),

the problem never happened.

 

I tried to use scp command in CentOS to transmit a file(4.4GB) from NameNode to a DataNode,

and I found that before the file transmitted to 70% (that is, the file has been transmitted about 3GB to other VM),

the speed is keeping about 110M/s,

but after 70% is done, the speed will getting slow,  100M/s... 90M/s... 80M/s... , and finally slow down to 15M/s,

and keeping this speed until the job is done.

 

I cannot find out the reason of this problem, it is really a strange situation.

Did you meet the same or similar problem before?

 

Thank you guys :)

 

 

Tony