About nauseous

nauseous · ‎03-27-2015

Found why. Typo my mistake not removing the 1TB file size that was being generated by teragen 🙂 Command worked: sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hadoop-0.20-mapreduce/hadoop-examples-2.5.0-mr1-cdh5.3.1.jar terasort /home/ssd/hdfs-input /home/ssd/hdfs-output Works perfectly now.

nauseous · ‎03-27-2015

Not sure what I'm doing wrong here but I keep getting the same error when I run terasort. Teragen works perfectly but terasort fails. Input path does not exist: hdfs://node0:8020/user/hdfs/10000000000 Command line used: sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hadoop-0.20-mapreduce/hadoop-examples-2.5.0-mr1-cdh5.3.1.jar terasort 10000000000 /home/ssd/hdfs-input /home/ssd/hdfs-output.

nauseous · ‎03-24-2015

Ok, the setup is simple you just create datanodes with 1 TT on namenode which took the network to 3500MB to other nodes which worked

nauseous · ‎03-17-2015

I found a way to increase network performance but only for write. When I run a read dfsio it only seems to be sending to the local drive from 1 system and not reading from multiple systems. I need the system to read through the network and not locally can any body help on how to force network reads using dfsio?

nauseous · ‎03-17-2015

I found this article which was super helpful. http://www.linuxsecrets.com/categories/linux-performance/performance-tips/redhat-prerequisites-benchmarking-and-stress-testing-on-hadoop-cluster

nauseous · ‎03-16-2015

I would also like to cut down on local hhd writes so I can evenly distribute data to all machines so I get more network traffic as well. @nauseous wrote: I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible. I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 100000 -fileSize 50 and sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 500 -fileSize 10GB without good results. I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio. Any suggestions would help greatly. @nauseous wrote: I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible. I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 100000 -fileSize 50 and sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 500 -fileSize 10GB without good results. I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio. Any suggestions would help greatly.

nauseous · ‎03-16-2015

I'm trying to get maximum throughput with Cloudera on RedHat 6.6 on 6 - Dell R730's with kernel 3.18.1, and using 2 - 850MB, 3G ssd transfer per second hhd with modified drivers which have been tested. Currently I've tried decommissioning "mapReduce tasktracker" on all nodes except 1 single node as suggested but didn't really make any differences in nic speed. I want to max out the connection speed on all nodes if possible. I've tried : sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 100000 -fileSize 50 and sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-test-2.5.0-mr1-cdh5.3.1.jar TestDFSIO -write -nrFiles 500 -fileSize 10GB without good results. I've already tested throughput with netperf but can't seem to get cloudera to perform network tests to maximum level like I have with netperf using dfsio. Any suggestions would help greatly.

nauseous · ‎03-09-2015

I wasn't able to find my post sense I don't have my machine but here is a great article here http://tinyurl.com/lr3gfx4 Hope this helps 🙂

nauseous · ‎03-09-2015

Great job. I try to keep the names as simple as possible so I can run thousands of scripts. My hosts files is like: 127.0.0.1 localhost Looping interfaces from multiple machines. AWS or for example Linode looping you just would use the internal looping device. Fast and quickly managed. #Cloudera Machines 192.168.2.1 n1 192.168.2.2 n2 192.168.2.3 n3 " n4 " n5 " n6 " n7 and so on to make it easier for changes across machines. Such as: for i in {1..300}; do ssh n$i date; done <-- Checks dates on all machines to make sure each machine is sync'ed. Makes life easier to make it simple.

nauseous · ‎03-09-2015

Great job. I try to keep the names as simple as possible so I can run thousands of scripts. My hosts files is like: 127.0.0.1

Online	Offline
Last Visited	‎01-13-2016 06:33 PM

Member Since	‎03-06-2015 12:19 PM
Last Visited	‎01-13-2016 06:33 PM
Posts	61
Kudos received	5

Cloudera Community

Re: Server error after someone change config

Re: Input path does not exist: hdfs://node0:8020/u...

Re: Need to get maximum network performance with C...

Re: Cloudera CDH5 Changing Launched Maps 100 to 0

Re: Input path does not exist: hdfs://node0:8020/u...

Input path does not exist: hdfs://node0:8020/user/...

Re: Need to get maximum network performance with C...

Re: Need to get maximum network performance with C...

Re: How do I resolve clock offset issue?

Re: Need to get maximum network performance with C...

Need to get maximum network performance with Cloud...

Re: Hi, does Cloudera Express work with RHEL 6.6?

Re: Failed to receive heartbeat from agent. (Curre...

Re: Failed to receive heartbeat from agent. (Curre...