Member since
03-06-2015
61
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2697 | 04-30-2015 09:04 AM | |
45236 | 03-27-2015 10:31 AM | |
2671 | 03-24-2015 12:27 PM | |
1424 | 03-06-2015 01:37 PM |
03-27-2015
10:31 AM
Found why. Typo my mistake not removing the 1TB file size that was being generated by teragen 🙂 Command worked: sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hadoop-0.20-mapreduce/hadoop-examples-2.5.0-mr1-cdh5.3.1.jar terasort /home/ssd/hdfs-input /home/ssd/hdfs-output Works perfectly now.
... View more
03-27-2015
10:14 AM
Not sure what I'm doing wrong here but I keep getting the same error when I run terasort. Teragen works perfectly but terasort fails. Input path does not exist: hdfs://node0:8020/user/hdfs/10000000000 Command line used: sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hadoop-0.20-mapreduce/hadoop-examples-2.5.0-mr1-cdh5.3.1.jar terasort 10000000000 /home/ssd/hdfs-input /home/ssd/hdfs-output.
... View more
Labels:
03-24-2015
12:27 PM
Ok, the setup is simple you just create datanodes with 1 TT on namenode which took the network to 3500MB to other nodes which worked
... View more
03-17-2015
02:11 PM
I found a way to increase network performance but only for write. When I run a read dfsio it only seems to be sending to the local drive from 1 system and not reading from multiple systems. I need the system to read through the network and not locally can any body help on how to force network reads using dfsio?
... View more
03-17-2015
08:24 AM
I found this article which was super helpful. http://www.linuxsecrets.com/categories/linux-performance/performance-tips/redhat-prerequisites-benchmarking-and-stress-testing-on-hadoop-cluster
... View more
03-16-2015
03:18 PM
I would also like to cut down on local hhd writes so I can evenly distribute data to all machines so I get more network traffic as well. @nauseous wrote: I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible. I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 100000 -fileSize 50 and sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 500 -fileSize 10GB without good results. I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio. Any suggestions would help greatly. @nauseous wrote: I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible. I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 100000 -fileSize 50 and sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 500 -fileSize 10GB without good results. I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio. Any suggestions would help greatly.
... View more
03-16-2015
02:13 PM
I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible. I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 100000 -fileSize 50 and sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 500 -fileSize 10GB without good results. I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio. Any suggestions would help greatly.
... View more
Labels:
03-09-2015
08:38 AM
I wasn't able to find my post sense I don't have my machine but here is a great article here http://tinyurl.com/lr3gfx4 Hope this helps 🙂
... View more
03-09-2015
08:28 AM
Great job. I try to keep the names as simple as possible so I can run thousands of scripts. My hosts files is like: 127.0.0.1 localhost Looping interfaces from multiple machines. AWS or for example Linode looping you just would use the internal looping device. Fast and quickly managed. #Cloudera Machines 192.168.2.1 n1 192.168.2.2 n2 192.168.2.3 n3 " n4 " n5 " n6 " n7 and so on to make it easier for changes across machines. Such as: for i in {1..300}; do ssh n$i date; done <-- Checks dates on all machines to make sure each machine is sync'ed. Makes life easier to make it simple.
... View more
03-09-2015
08:22 AM
Great job. I try to keep the names as simple as possible so I can run thousands of scripts. My hosts files is like: 127.0.0.1
... View more
- « Previous
-
- 1
- 2
- Next »