Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase/Zookeeper error and job deadlock

Highlighted

HBase/Zookeeper error and job deadlock

New Contributor

Hi,

 

I have recently started working on HBase, Hadoop and ZooKeeper. I am able to set up a 20-node cluster on Amazon-EC2 using Cloudera Manager. Also, I have installed Hadoop, HBase, MapReduce and ZooKeeper on the cluster, using CM. Now, I am trying to run a map-reduce job on it.

 

Before running the job, if I start a ZooKeeper instance on EACH node (i.e. 20 instances), the job runs fine. But I get a warning from CM that you should not run ZK on more than 5 nodes. But if I run ZK on only 5 nodes out of 20, then the job hangs in the reduce phase forever. And I see the following error in tasktracker logs:

 

2014-04-02 22:31:36,467 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:124)
	at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:83)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:986)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:997)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958)
	at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:288)
	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192)
	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:166)
	at com.ancestry.jermline.mapreduce.matchwords.WordMapper.flushDictionaries(WordMapper.java:109)
	at com.ancestry.jermline.mapreduce.matchwords.WordMapper.cleanup(WordMapper.java:99)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
2014-04-02 22:31:36,468 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2014-04-02 22:31:36,468 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-02 22:31:36,468 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
2014-04-02 22:31:37,569 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip6-localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-02 22:31:37,569 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
2014-04-02 22:31:37,670 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-02 22:31:37,671 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)

I tried to google the error. And modified the /etc/hosts file on all the nodes, something like this:

 

ubuntu@ip-10-254-140-2:~$ cat /etc/hosts
#127.0.0.1 localhost
10.254.140.2 ip-10-254-140-2.us-west-2.compute.internal

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

 

(Also tried with '127.0.0.1 localhost' uncommented). But it is not working. All other Hadoop, HBase, ZooKeeper settings are unchanged (except for the fact that by default only 1 ZK instance is there, now there are 5 instnaces of ZK). If you want me to share any of the config files, please let me know and I will update the description.

 

Any help in much appreciated. Thanks.

5 REPLIES 5

Re: HBase/Zookeeper error and job deadlock

Master Collaborator

It appears that your zookeeper settings for the mapreduce service are not correct.  The task is trying to connect to ZK on the loopback address of it's local machine and there is no ZK server listening there.   It is true that you do not want more than 5 ZK servers for your cluster.  3 is usually fine, in fact.  But you need to deploy your HBase client configs to the nodes in the cluster so that any service trying to use HBase knows how to find your 3 or 5 ZK servers.  Try deploying client configs for HBase to the cluster and any node where you are running MR jobs from and this should help.

Re: HBase/Zookeeper error and job deadlock

New Contributor

@Clint: Thank you so much for the reply. I am pretty new to HBase/ZooKeeper world. Can you please explain "Try deploying client configs for HBase to the cluster and any node where you are running MR jobs from and this should help." in a bit more detail?

 

Thanks again.

Re: HBase/Zookeeper error and job deadlock

1. Click on your HBase service.

2. Click on the Instances tab.

3. For every host where you run MR jobs, make sure that there is an HBase role on that host.

3.a) If a host where you run MR jobs does not have an HBase host, click Add, then add a Gateway role to all appropriate hosts

4. Go back to the main page (click the Cloudera Manager logo in the upper left)

5. In the dropdown menu near your cluster name, select Deploy Client Configuration.

 

If the problem still persists, make sure that MapReduce also has a role on the desired hosts, and deploy client configuration again.

Re: HBase/Zookeeper error and job deadlock

New Contributor

Hi dlo,

 

Thanks for the reply. I have solved the problem by passing the hbase quorum property as a command line argument to the hadoop command. But this is still good info, as I am new and trying to learn about Cloudera Manager and HBase/ZooKeeper.

 

Thanks,

Bhushan

Re: HBase/Zookeeper error and job deadlock

New Contributor

Hi Darren,

 

I tried your suggestion in your last post, i.e. deploy the client configuration on all the hosts. It doesn't seem to work. It is still getting stalled in the reduce phase and does not make any progress.

 

Anything else I should try?

 

Thanks again for all your help.

 

-Bhushan

 

 

EDIT:

 


After waiting for several minutes while the job was not progressing, it spit out some more error logs:

 

 

14/04/18 19:02:09 INFO mapred.JobClient: Task Id : attempt_201404181806_0007_m_000001_0, Status : FAILED
Task attempt_201404181806_0007_m_000001_0 failed to report status for 600 seconds. Killing!
attempt_201404181806_0007_m_000001_0: 2014-04-18 19:02:02
attempt_201404181806_0007_m_000001_0: Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode):
attempt_201404181806_0007_m_000001_0: "main-EventThread" daemon prio=10 tid=0x00007f22dc9fc000 nid=0x3f89 waiting on condition [0x00007f22bba71000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (parking)
attempt_201404181806_0007_m_000001_0: at sun.misc.Unsafe.park(Native Method)
attempt_201404181806_0007_m_000001_0: - parking to wait for <0x00000000f0fb0078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494)
attempt_201404181806_0007_m_000001_0: "main-SendThread(localhost:2181)" daemon prio=10 tid=0x00007f22dc9d4800 nid=0x3f88 waiting on condition [0x00007f22bbb72000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:940)
attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1000)
attempt_201404181806_0007_m_000001_0: "SpillThread" daemon prio=10 tid=0x00007f22dc1ef800 nid=0x3f85 waiting on condition [0x00007f22bbefd000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (parking)
attempt_201404181806_0007_m_000001_0: at sun.misc.Unsafe.park(Native Method)
attempt_201404181806_0007_m_000001_0: - parking to wait for <0x00000000f3c9cda0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1218)
attempt_201404181806_0007_m_000001_0: "org.apache.hadoop.hdfs.PeerCache@20700fb1" daemon prio=10 tid=0x00007f22dc948000 nid=0x3f84 waiting on condition [0x00007f22bbffe000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:252)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:39)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:135)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.run(Thread.java:745)
attempt_201404181806_0007_m_000001_0: "communication thread" daemon prio=10 tid=0x00007f22dc8c7800 nid=0x3f79 in Object.wait() [0x00007f22d82b9000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (on object monitor)
attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3c61610> (a java.lang.Object)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:659)
attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3c61610> (a java.lang.Object)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.run(Thread.java:745)
attempt_201404181806_0007_m_000001_0: "Timer thread for monitoring jvm" daemon prio=10 tid=0x00007f22dc720800 nid=0x3f75 in Object.wait() [0x00007f22d83ba000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (on object monitor)
attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3bb6e50> (a java.util.TaskQueue)
attempt_201404181806_0007_m_000001_0: at java.util.TimerThread.mainLoop(Timer.java:552)
attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3bb6e50> (a java.util.TaskQueue)
attempt_201404181806_0007_m_000001_0: at java.util.TimerThread.run(Timer.java:505)
attempt_201404181806_0007_m_000001_0: "IPC Parameter Sending Thread #0" daemon prio=10 tid=0x00007f22dc7be800 nid=0x3f64 waiting on condition [0x00007f22d84bb000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (parking)
attempt_201404181806_0007_m_000001_0: at sun.misc.Unsafe.park(Native Method)
attempt_201404181806_0007_m_000001_0: - parking to wait for <0x00000000f3b0ede8> (a java.util.concurrent.SynchronousQueue$TransferStack)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.run(Thread.java:745)
attempt_201404181806_0007_m_000001_0: "IPC Client (1334156684) connection to /127.0.0.1:38066 from job_201404181806_0007" daemon prio=10 tid=0x00007f22dc7c0800 nid=0x3f63 in Object.wait() [0x00007f22d85bc000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (on object monitor)
attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3b0ee78> (a org.apache.hadoop.ipc.Client$Connection)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:803)
attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3b0ee78> (a org.apache.hadoop.ipc.Client$Connection)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.ipc.Client$Connection.run(Client.java:846)
attempt_201404181806_0007_m_000001_0: "Thread for syncLogs" daemon prio=10 tid=0x00007f22dc78b800 nid=0x3f62 waiting on condition [0x00007f22e01eb000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Child$3.run(Child.java:156)
attempt_201404181806_0007_m_000001_0: "Service Thread" daemon prio=10 tid=0x00007f22dc093000 nid=0x3f5c runnable [0x0000000000000000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000001_0: "C2 CompilerThread1" daemon prio=10 tid=0x00007f22dc090800 nid=0x3f5b waiting on condition [0x0000000000000000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000001_0: "C2 CompilerThread0" daemon prio=10 tid=0x00007f22dc08e000 nid=0x3f5a waiting on condition [0x0000000000000000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000001_0: "Signal Dispatcher" daemon prio=10 tid=0x00007f22dc084000 nid=0x3f59 waiting on condition [0x0000000000000000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000001_0: "Finalizer" daemon prio=10 tid=0x00007f22dc06e800 nid=0x3f57 in Object.wait() [0x00007f22e1449000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (on object monitor)
attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3981c18> (a java.lang.ref.ReferenceQueue$Lock)
attempt_201404181806_0007_m_000001_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3981c18> (a java.lang.ref.ReferenceQueue$Lock)
attempt_201404181806_0007_m_000001_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
attempt_201404181806_0007_m_000001_0: at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
attempt_201404181806_0007_m_000001_0: "Reference Handler" daemon prio=10 tid=0x00007f22dc06a800 nid=0x3f56 in Object.wait() [0x00007f22e154a000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (on object monitor)
attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3981cb0> (a java.lang.ref.Reference$Lock)
attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Object.java:503)
attempt_201404181806_0007_m_000001_0: at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3981cb0> (a java.lang.ref.Reference$Lock)
attempt_201404181806_0007_m_000001_0: "main" prio=10 tid=0x00007f22dc012000 nid=0x3f54 waiting on condition [0x00007f22e4812000]
attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Thread.java:340)
attempt_201404181806_0007_m_000001_0: at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:360)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:54)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:185)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.checkIfBaseNodeAvailable(ZooKeeperNodeTracker.java:208)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:77)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:986)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:997)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:288)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:166)
attempt_201404181806_0007_m_000001_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.flushDictionaries(WordMapper.java:109)
attempt_201404181806_0007_m_000001_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.cleanup(WordMapper.java:99)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
attempt_201404181806_0007_m_000001_0: at java.security.AccessController.doPrivileged(Native Method)
attempt_201404181806_0007_m_000001_0: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201404181806_0007_m_000001_0: "VM Thread" prio=10 tid=0x00007f22dc068000 nid=0x3f55 runnable
attempt_201404181806_0007_m_000001_0: "VM Periodic Task Thread" prio=10 tid=0x00007f22dc09e800 nid=0x3f5d waiting on condition
attempt_201404181806_0007_m_000001_0: JNI global references: 286
attempt_201404181806_0007_m_000001_0: Heap
attempt_201404181806_0007_m_000001_0: def new generation total 18112K, used 14672K [0x00000000efe00000, 0x00000000f11a0000, 0x00000000f38a0000)
attempt_201404181806_0007_m_000001_0: eden space 16128K, 88% used [0x00000000efe00000, 0x00000000f0bf11c8, 0x00000000f0dc0000)
attempt_201404181806_0007_m_000001_0: from space 1984K, 19% used [0x00000000f0fb0000, 0x00000000f1013108, 0x00000000f11a0000)
attempt_201404181806_0007_m_000001_0: to space 1984K, 0% used [0x00000000f0dc0000, 0x00000000f0dc0000, 0x00000000f0fb0000)
attempt_201404181806_0007_m_000001_0: tenured generation total 88708K, used 57707K [0x00000000f38a0000, 0x00000000f8f41000, 0x00000000fae00000)
attempt_201404181806_0007_m_000001_0: the space 88708K, 65% used [0x00000000f38a0000, 0x00000000f70fae38, 0x00000000f70fb000, 0x00000000f8f41000)
attempt_201404181806_0007_m_000001_0: compacting perm gen total 21248K, used 20414K [0x00000000fae00000, 0x00000000fc2c0000, 0x0000000100000000)
attempt_201404181806_0007_m_000001_0: the space 21248K, 96% used [0x00000000fae00000, 0x00000000fc1ef8a8, 0x00000000fc1efa00, 0x00000000fc2c0000)
attempt_201404181806_0007_m_000001_0: No shared spaces configured.
attempt_201404181806_0007_m_000001_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201404181806_0007_m_000001_0: SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201404181806_0007_m_000001_0: SLF4J: Found binding in [jar:file:/mnt/mapred/local/taskTracker/ubuntu/jobcache/job_201404181806_0007/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201404181806_0007_m_000001_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/04/18 19:02:09 INFO mapred.JobClient: Task Id : attempt_201404181806_0007_m_000004_0, Status : FAILED
Task attempt_201404181806_0007_m_000004_0 failed to report status for 600 seconds. Killing!
attempt_201404181806_0007_m_000004_0: 2014-04-18 19:02:02
attempt_201404181806_0007_m_000004_0: Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode):
attempt_201404181806_0007_m_000004_0: "main-EventThread" daemon prio=10 tid=0x00007f97649dc000 nid=0x2f3b waiting on condition [0x00007f974366c000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (parking)
attempt_201404181806_0007_m_000004_0: at sun.misc.Unsafe.park(Native Method)
attempt_201404181806_0007_m_000004_0: - parking to wait for <0x00000000f0fb0078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494)
attempt_201404181806_0007_m_000004_0: "main-SendThread(localhost:2181)" daemon prio=10 tid=0x00007f97649b7800 nid=0x2f3a waiting on condition [0x00007f974376d000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:940)
attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1000)
attempt_201404181806_0007_m_000004_0: "SpillThread" daemon prio=10 tid=0x00007f97641f7800 nid=0x2f38 waiting on condition [0x00007f9743af9000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (parking)
attempt_201404181806_0007_m_000004_0: at sun.misc.Unsafe.park(Native Method)
attempt_201404181806_0007_m_000004_0: - parking to wait for <0x00000000f3ca1f48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1218)
attempt_201404181806_0007_m_000004_0: "org.apache.hadoop.hdfs.PeerCache@1f472be1" daemon prio=10 tid=0x00007f9764935800 nid=0x2f37 waiting on condition [0x00007f9743bfa000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:252)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:39)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:135)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.run(Thread.java:745)
attempt_201404181806_0007_m_000004_0: "communication thread" daemon prio=10 tid=0x00007f97648b7800 nid=0x2f2e in Object.wait() [0x00007f9743dfc000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (on object monitor)
attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3c65328> (a java.lang.Object)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:659)
attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3c65328> (a java.lang.Object)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.run(Thread.java:745)
attempt_201404181806_0007_m_000004_0: "Timer thread for monitoring jvm" daemon prio=10 tid=0x00007f97640e6000 nid=0x2f2d in Object.wait() [0x00007f9743efd000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (on object monitor)
attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3bb3508> (a java.util.TaskQueue)
attempt_201404181806_0007_m_000004_0: at java.util.TimerThread.mainLoop(Timer.java:552)
attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3bb3508> (a java.util.TaskQueue)
attempt_201404181806_0007_m_000004_0: at java.util.TimerThread.run(Timer.java:505)
attempt_201404181806_0007_m_000004_0: "IPC Parameter Sending Thread #0" daemon prio=10 tid=0x00007f97647d6800 nid=0x2f2c waiting on condition [0x00007f9743ffe000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (parking)
attempt_201404181806_0007_m_000004_0: at sun.misc.Unsafe.park(Native Method)
attempt_201404181806_0007_m_000004_0: - parking to wait for <0x00000000f3b0ee00> (a java.util.concurrent.SynchronousQueue$TransferStack)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.run(Thread.java:745)
attempt_201404181806_0007_m_000004_0: "IPC Client (1294629661) connection to /127.0.0.1:48262 from job_201404181806_0007" daemon prio=10 tid=0x00007f97647d8800 nid=0x2f2b in Object.wait() [0x00007f9760123000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (on object monitor)
attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3b0ee90> (a org.apache.hadoop.ipc.Client$Connection)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:803)
attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3b0ee90> (a org.apache.hadoop.ipc.Client$Connection)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.ipc.Client$Connection.run(Client.java:846)
attempt_201404181806_0007_m_000004_0: "Thread for syncLogs" daemon prio=10 tid=0x00007f97647a3800 nid=0x2f2a waiting on condition [0x00007f9760224000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Child$3.run(Child.java:156)
attempt_201404181806_0007_m_000004_0: "Service Thread" daemon prio=10 tid=0x00007f9764093000 nid=0x2f1b runnable [0x0000000000000000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000004_0: "C2 CompilerThread1" daemon prio=10 tid=0x00007f9764090800 nid=0x2f1a waiting on condition [0x0000000000000000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000004_0: "C2 CompilerThread0" daemon prio=10 tid=0x00007f976408e000 nid=0x2f19 waiting on condition [0x0000000000000000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000004_0: "Signal Dispatcher" daemon prio=10 tid=0x00007f9764084000 nid=0x2f18 waiting on condition [0x0000000000000000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE
attempt_201404181806_0007_m_000004_0: "Finalizer" daemon prio=10 tid=0x00007f976406e800 nid=0x2f17 in Object.wait() [0x00007f9768f46000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (on object monitor)
attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3981c20> (a java.lang.ref.ReferenceQueue$Lock)
attempt_201404181806_0007_m_000004_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3981c20> (a java.lang.ref.ReferenceQueue$Lock)
attempt_201404181806_0007_m_000004_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
attempt_201404181806_0007_m_000004_0: at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
attempt_201404181806_0007_m_000004_0: "Reference Handler" daemon prio=10 tid=0x00007f976406a800 nid=0x2f16 in Object.wait() [0x00007f9769047000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (on object monitor)
attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method)
attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3981cb8> (a java.lang.ref.Reference$Lock)
attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Object.java:503)
attempt_201404181806_0007_m_000004_0: at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3981cb8> (a java.lang.ref.Reference$Lock)
attempt_201404181806_0007_m_000004_0: "main" prio=10 tid=0x00007f9764012000 nid=0x2f14 waiting on condition [0x00007f976c30f000]
attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method)
attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Thread.java:340)
attempt_201404181806_0007_m_000004_0: at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:360)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:54)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:185)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.checkIfBaseNodeAvailable(ZooKeeperNodeTracker.java:208)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:77)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:986)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:997)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:288)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:166)
attempt_201404181806_0007_m_000004_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.flushDictionaries(WordMapper.java:109)
attempt_201404181806_0007_m_000004_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.cleanup(WordMapper.java:99)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
attempt_201404181806_0007_m_000004_0: at java.security.AccessController.doPrivileged(Native Method)
attempt_201404181806_0007_m_000004_0: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201404181806_0007_m_000004_0: "VM Thread" prio=10 tid=0x00007f9764068000 nid=0x2f15 runnable
attempt_201404181806_0007_m_000004_0: "VM Periodic Task Thread" prio=10 tid=0x00007f976409e800 nid=0x2f1c waiting on condition
attempt_201404181806_0007_m_000004_0: JNI global references: 282
attempt_201404181806_0007_m_000004_0: Heap
attempt_201404181806_0007_m_000004_0: def new generation total 18112K, used 12448K [0x00000000efe00000, 0x00000000f11a0000, 0x00000000f38a0000)
attempt_201404181806_0007_m_000004_0: eden space 16128K, 75% used [0x00000000efe00000, 0x00000000f09d7628, 0x00000000f0dc0000)
attempt_201404181806_0007_m_000004_0: from space 1984K, 16% used [0x00000000f0fb0000, 0x00000000f1000c58, 0x00000000f11a0000)
attempt_201404181806_0007_m_000004_0: to space 1984K, 0% used [0x00000000f0dc0000, 0x00000000f0dc0000, 0x00000000f0fb0000)
attempt_201404181806_0007_m_000004_0: tenured generation total 88708K, used 57600K [0x00000000f38a0000, 0x00000000f8f41000, 0x00000000fae00000)
attempt_201404181806_0007_m_000004_0: the space 88708K, 64% used [0x00000000f38a0000, 0x00000000f70e0060, 0x00000000f70e0200, 0x00000000f8f41000)
attempt_201404181806_0007_m_000004_0: compacting perm gen total 21248K, used 20413K [0x00000000fae00000, 0x00000000fc2c0000, 0x0000000100000000)
attempt_201404181806_0007_m_000004_0: the space 21248K, 96% used [0x00000000fae00000, 0x00000000fc1ef590, 0x00000000fc1ef600, 0x00000000fc2c0000)
attempt_201404181806_0007_m_000004_0: No shared spaces configured.
attempt_201404181806_0007_m_000004_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201404181806_0007_m_000004_0: SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201404181806_0007_m_000004_0: SLF4J: Found binding in [jar:file:/mnt/mapred/local/taskTracker/ubuntu/jobcache/job_201404181806_0007/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201404181806_0007_m_000004_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

Don't have an account?
Coming from Hortonworks? Activate your account here