Reply
Expert Contributor
Posts: 61
Registered: ‎03-06-2015
Accepted Solution

Need to get maximum network performance with Cloudera CDH5 from a 40G network on RedHat 6.6

I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible.

 

I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar  TestDFSIO -write -nrFiles 100000 -fileSize 50

 

and 

 

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar  TestDFSIO -write -nrFiles 500 -fileSize 10GB

 

without good results.

 

I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio.  

 

Any suggestions would help greatly.

 

 

Expert Contributor
Posts: 61
Registered: ‎03-06-2015

Re: Need to get maximum network performance with Cloudera CDH5 from a 40G network on RedHat 6.6

I would also like to cut down on local hhd writes so I can evenly distribute data to all machines so I get more network traffic as well.

 


@nauseous wrote:

I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible.

 

I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar  TestDFSIO -write -nrFiles 100000 -fileSize 50

 

and 

 

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar  TestDFSIO -write -nrFiles 500 -fileSize 10GB

 

without good results.

 

I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio.  

 

Any suggestions would help greatly.

 

 



@nauseous wrote:

I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible.

 

I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar  TestDFSIO -write -nrFiles 100000 -fileSize 50

 

and 

 

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar  TestDFSIO -write -nrFiles 500 -fileSize 10GB

 

without good results.

 

I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio.  

 

Any suggestions would help greatly.

 

 




Expert Contributor
Posts: 61
Registered: ‎03-06-2015

Re: Need to get maximum network performance with Cloudera CDH5 from a 40G network on RedHat 6.6

I found a way to increase network performance but only for write. When I run a read dfsio it only seems to be sending to the local drive from 1 system and not reading from multiple systems. I need the system to read through the network and not locally can any body help on how to force network reads using dfsio?

Expert Contributor
Posts: 61
Registered: ‎03-06-2015

Re: Need to get maximum network performance with Cloudera CDH5 from a 40G network on RedHat 6.6

Ok, the setup is simple you just create datanodes with 1 TT on namenode which took the network to 3500MB to other nodes which worked

Announcements