About GautamG

GautamG · ‎09-08-2014

Thank you for the feedback.

GautamG · ‎09-08-2014

Like the definitive guide says "Hadoop allows the user to specify a combiner function to be run on the map output, and the combiner function’s output forms the input to the reduce function. " . Frequently the code in the reducer and combiner is similar but doesn't have to be. Your question is unclear. Can you elaborate a bit?

GautamG · ‎09-08-2014

If you are only looking to learn, you are fine with using multiple VMs on the same host. But performance will be poor if the VMs are starved for CPU or if they share disks. Looks like you are just beginning to use Hadoop, so I would suggest first getting up to speed with installation, and configuration rather than performance. Get yourself a copy of these two books: - Hadoop Operations / Eric Sammer http://shop.oreilly.com/product/0636920025085.do - Hadoop: The Definitive Guide http://shop.oreilly.com/product/9780596521981.do

GautamG · ‎09-08-2014

We see a lot of these in the JobTracker jstack. So the namenode is responding. "DataStreamer for file /tmp/hadoop-hadoop-user/7418759843_pipe_1371547789813_7CC40A5EC84074F51068D326FE4B44CD/_logs/history/job_201409040312_85799_1409897033005_hadoop-user_%5B3529B6C5248F26FE0B927AADBA7BDA41%2F7E4BD3F9FCBCBE4B block BP-2096330913-10.250.195.101-1373872395153:blk_468657822786954548_993063000" daemon prio=10 tid=0x00007f1f2a96f000 nid=0x7b56 in Object.wait() [0x00007f1ebc9e7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464) - locked <0x0000000625121b00> (a java.util.LinkedList) Have you noticed a large spike in number of blocks and have you tuned your NN heap to deal with this rise? Did the JT pause only began when you turned on compression of fsimage?

GautamG · ‎09-08-2014

> Installed: hadoop-2.3.0+cdh5.1.2+816-1.cdh5.1.2.p0.3.el6.x86_64 (@cloudera-cdh5) > Installed: hadoop-hdfs-2.3.0+cdh5.1.2+816-1.cdh5.1.2.p0.3.el6.x86_64 (@cloudera-cdh5) remove those two packages first, then ensure your CDH5 repo points to 5.0.3, not 5 (which will be the latest in 5.x) # yum remove hadoop hadoop-hdfs # yum clean all # yum makecache # yum list | grep cdh5.1.2 (should not list anything)

GautamG · ‎09-08-2014

The map task's local output is not stored within HDFS, rather in temporary directories on that specific node (see property mapreduce.cluster.local.dir) written using standard file I/O https://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

GautamG · ‎09-07-2014

The error is originating from ZooKeeper. Which directory did you move: /var/log or /var/log/hbase? Did you try restarting the entire cluster?

GautamG · ‎09-04-2014

Do your name nodes and jobtracker run on the same host?

GautamG · ‎09-04-2014

How long does this pause normally last? If you are able to, capture 3-5 jstack of the jobtracker spaced a few seconds apart and upload it here (pastebin or gist)

GautamG · ‎09-04-2014

Hello Charles, as a beginner it would be easier if you experimented with Hadoop on AWS instances before buying hardware. You can begin by building a simple 3-4 node cluster. The hardware requirements depend on your planned work but you can begin with nodes with 8GB RAM and storage based in your data set. Get familiar with the software and then look to scale upward

Online	Offline
Last Visited	‎01-18-2026 09:06 PM

Member Since	‎01-20-2014 12:14 AM
Last Visited	‎01-18-2026 09:06 PM
Posts	578
Kudos received	102

Cloudera Community

Re: BDR Throwing Error : Hive Table does not match...

Re: parcel usages (Active, 0)

Re: Upgrading to CentOS 6.7... what version of CDH...

Re: 1 of the 3 node Zookeeper quorum failed, how t...

Re: Parcel distro suffixes

Re: Re-install a recently uninstalled Hosts fail -...

Re: Question about combiners

Re: Advice on Hardware Specification

Re: HDFS Checkpoint problem

Re: Re-install a recently uninstalled Hosts fail -...

Re: Question about map output

Re: HBASE Master failed to restart (related to zoo...

Re: HDFS Checkpoint problem

Re: HDFS Checkpoint problem

Re: Advice on Hardware Specification