Member since
01-20-2014
578
Posts
102
Kudos Received
94
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5721 | 10-28-2015 10:28 PM | |
2717 | 10-10-2015 08:30 PM | |
4743 | 10-10-2015 08:02 PM | |
3537 | 10-07-2015 02:38 PM | |
2327 | 10-06-2015 01:24 AM |
09-08-2014
09:32 PM
Thank you for the feedback.
... View more
09-08-2014
05:49 AM
Like the definitive guide says "Hadoop allows the user to specify a combiner function to be run on the map output, and the combiner function’s output forms the input to the reduce function. " . Frequently the code in the reducer and combiner is similar but doesn't have to be. Your question is unclear. Can you elaborate a bit?
... View more
09-08-2014
05:16 AM
If you are only looking to learn, you are fine with using multiple VMs on the same host. But performance will be poor if the VMs are starved for CPU or if they share disks. Looks like you are just beginning to use Hadoop, so I would suggest first getting up to speed with installation, and configuration rather than performance. Get yourself a copy of these two books: - Hadoop Operations / Eric Sammer http://shop.oreilly.com/product/0636920025085.do - Hadoop: The Definitive Guide http://shop.oreilly.com/product/9780596521981.do
... View more
09-08-2014
05:11 AM
We see a lot of these in the JobTracker jstack. So the namenode is responding. "DataStreamer for file /tmp/hadoop-hadoop-user/7418759843_pipe_1371547789813_7CC40A5EC84074F51068D326FE4B44CD/_logs/history/job_201409040312_85799_1409897033005_hadoop-user_%5B3529B6C5248F26FE0B927AADBA7BDA41%2F7E4BD3F9FCBCBE4B block BP-2096330913-10.250.195.101-1373872395153:blk_468657822786954548_993063000" daemon prio=10 tid=0x00007f1f2a96f000 nid=0x7b56 in Object.wait() [0x00007f1ebc9e7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464) - locked <0x0000000625121b00> (a java.util.LinkedList) Have you noticed a large spike in number of blocks and have you tuned your NN heap to deal with this rise? Did the JT pause only began when you turned on compression of fsimage?
... View more
09-08-2014
01:54 AM
> Installed: hadoop-2.3.0+cdh5.1.2+816-1.cdh5.1.2.p0.3.el6.x86_64 (@cloudera-cdh5) > Installed: hadoop-hdfs-2.3.0+cdh5.1.2+816-1.cdh5.1.2.p0.3.el6.x86_64 (@cloudera-cdh5) remove those two packages first, then ensure your CDH5 repo points to 5.0.3, not 5 (which will be the latest in 5.x) # yum remove hadoop hadoop-hdfs # yum clean all # yum makecache # yum list | grep cdh5.1.2 (should not list anything)
... View more
09-08-2014
12:14 AM
1 Kudo
The map task's local output is not stored within HDFS, rather in temporary directories on that specific node (see property mapreduce.cluster.local.dir) written using standard file I/O https://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
... View more
09-07-2014
11:14 PM
The error is originating from ZooKeeper. Which directory did you move: /var/log or /var/log/hbase? Did you try restarting the entire cluster?
... View more
09-04-2014
04:43 AM
How long does this pause normally last? If you are able to, capture 3-5 jstack of the jobtracker spaced a few seconds apart and upload it here (pastebin or gist)
... View more
09-04-2014
04:15 AM
Hello Charles, as a beginner it would be easier if you experimented with Hadoop on AWS instances before buying hardware. You can begin by building a simple 3-4 node cluster. The hardware requirements depend on your planned work but you can begin with nodes with 8GB RAM and storage based in your data set. Get familiar with the software and then look to scale upward
... View more