Member since
03-31-2014
14
Posts
0
Kudos Received
0
Solutions
04-21-2014
11:23 AM
I am not using YARN 😞 Also, where is the " safety valve for MR ", _client_ environment ?
... View more
04-18-2014
12:17 PM
Hi Darren Yes, under the Processes tab, of the DataNode, the stdout shows JAVA_HOME to be set correctly: Fri Apr 18 19:01:06 UTC 2014
JAVA_HOME=/usr/lib/jvm/java-7-oracle-cloudera
using /usr/lib/jvm/java-7-oracle-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /run/cloudera-scm-agent/process/7-hdfs-DATANODE as CONF_DIR
using as SECURE_USER
using as SECURE_GROUP
unlimited Looks like Hadoop itself is picking up the right Java version. But not the submitted job itself. Anything you can suggest to fix this? Thanks, Bhushan
... View more
04-18-2014
11:56 AM
Hi Darren, I tried your suggestion in your last post, i.e. deploy the client configuration on all the hosts. It doesn't seem to work. It is still getting stalled in the reduce phase and does not make any progress. Anything else I should try? Thanks again for all your help. -Bhushan EDIT: After waiting for several minutes while the job was not progressing, it spit out some more error logs: 14/04/18 19:02:09 INFO mapred.JobClient: Task Id : attempt_201404181806_0007_m_000001_0, Status : FAILED Task attempt_201404181806_0007_m_000001_0 failed to report status for 600 seconds. Killing! attempt_201404181806_0007_m_000001_0: 2014-04-18 19:02:02 attempt_201404181806_0007_m_000001_0: Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode): attempt_201404181806_0007_m_000001_0: "main-EventThread" daemon prio=10 tid=0x00007f22dc9fc000 nid=0x3f89 waiting on condition [0x00007f22bba71000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (parking) attempt_201404181806_0007_m_000001_0: at sun.misc.Unsafe.park(Native Method) attempt_201404181806_0007_m_000001_0: - parking to wait for <0x00000000f0fb0078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494) attempt_201404181806_0007_m_000001_0: "main-SendThread(localhost:2181)" daemon prio=10 tid=0x00007f22dc9d4800 nid=0x3f88 waiting on condition [0x00007f22bbb72000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86) attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:940) attempt_201404181806_0007_m_000001_0: at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1000) attempt_201404181806_0007_m_000001_0: "SpillThread" daemon prio=10 tid=0x00007f22dc1ef800 nid=0x3f85 waiting on condition [0x00007f22bbefd000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (parking) attempt_201404181806_0007_m_000001_0: at sun.misc.Unsafe.park(Native Method) attempt_201404181806_0007_m_000001_0: - parking to wait for <0x00000000f3c9cda0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1218) attempt_201404181806_0007_m_000001_0: "org.apache.hadoop.hdfs.PeerCache@20700fb1" daemon prio=10 tid=0x00007f22dc948000 nid=0x3f84 waiting on condition [0x00007f22bbffe000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:252) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:39) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:135) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.run(Thread.java:745) attempt_201404181806_0007_m_000001_0: "communication thread" daemon prio=10 tid=0x00007f22dc8c7800 nid=0x3f79 in Object.wait() [0x00007f22d82b9000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (on object monitor) attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3c61610> (a java.lang.Object) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:659) attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3c61610> (a java.lang.Object) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.run(Thread.java:745) attempt_201404181806_0007_m_000001_0: "Timer thread for monitoring jvm" daemon prio=10 tid=0x00007f22dc720800 nid=0x3f75 in Object.wait() [0x00007f22d83ba000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (on object monitor) attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3bb6e50> (a java.util.TaskQueue) attempt_201404181806_0007_m_000001_0: at java.util.TimerThread.mainLoop(Timer.java:552) attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3bb6e50> (a java.util.TaskQueue) attempt_201404181806_0007_m_000001_0: at java.util.TimerThread.run(Timer.java:505) attempt_201404181806_0007_m_000001_0: "IPC Parameter Sending Thread #0" daemon prio=10 tid=0x00007f22dc7be800 nid=0x3f64 waiting on condition [0x00007f22d84bb000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (parking) attempt_201404181806_0007_m_000001_0: at sun.misc.Unsafe.park(Native Method) attempt_201404181806_0007_m_000001_0: - parking to wait for <0x00000000f3b0ede8> (a java.util.concurrent.SynchronousQueue$TransferStack) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.run(Thread.java:745) attempt_201404181806_0007_m_000001_0: "IPC Client (1334156684) connection to /127.0.0.1:38066 from job_201404181806_0007" daemon prio=10 tid=0x00007f22dc7c0800 nid=0x3f63 in Object.wait() [0x00007f22d85bc000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (on object monitor) attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3b0ee78> (a org.apache.hadoop.ipc.Client$Connection) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:803) attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3b0ee78> (a org.apache.hadoop.ipc.Client$Connection) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.ipc.Client$Connection.run(Client.java:846) attempt_201404181806_0007_m_000001_0: "Thread for syncLogs" daemon prio=10 tid=0x00007f22dc78b800 nid=0x3f62 waiting on condition [0x00007f22e01eb000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Child$3.run(Child.java:156) attempt_201404181806_0007_m_000001_0: "Service Thread" daemon prio=10 tid=0x00007f22dc093000 nid=0x3f5c runnable [0x0000000000000000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000001_0: "C2 CompilerThread1" daemon prio=10 tid=0x00007f22dc090800 nid=0x3f5b waiting on condition [0x0000000000000000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000001_0: "C2 CompilerThread0" daemon prio=10 tid=0x00007f22dc08e000 nid=0x3f5a waiting on condition [0x0000000000000000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000001_0: "Signal Dispatcher" daemon prio=10 tid=0x00007f22dc084000 nid=0x3f59 waiting on condition [0x0000000000000000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000001_0: "Finalizer" daemon prio=10 tid=0x00007f22dc06e800 nid=0x3f57 in Object.wait() [0x00007f22e1449000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (on object monitor) attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3981c18> (a java.lang.ref.ReferenceQueue$Lock) attempt_201404181806_0007_m_000001_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3981c18> (a java.lang.ref.ReferenceQueue$Lock) attempt_201404181806_0007_m_000001_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) attempt_201404181806_0007_m_000001_0: at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189) attempt_201404181806_0007_m_000001_0: "Reference Handler" daemon prio=10 tid=0x00007f22dc06a800 nid=0x3f56 in Object.wait() [0x00007f22e154a000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: WAITING (on object monitor) attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000001_0: - waiting on <0x00000000f3981cb0> (a java.lang.ref.Reference$Lock) attempt_201404181806_0007_m_000001_0: at java.lang.Object.wait(Object.java:503) attempt_201404181806_0007_m_000001_0: at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) attempt_201404181806_0007_m_000001_0: - locked <0x00000000f3981cb0> (a java.lang.ref.Reference$Lock) attempt_201404181806_0007_m_000001_0: "main" prio=10 tid=0x00007f22dc012000 nid=0x3f54 waiting on condition [0x00007f22e4812000] attempt_201404181806_0007_m_000001_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000001_0: at java.lang.Thread.sleep(Thread.java:340) attempt_201404181806_0007_m_000001_0: at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:360) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:54) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:185) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.checkIfBaseNodeAvailable(ZooKeeperNodeTracker.java:208) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:77) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:986) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:997) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:288) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:166) attempt_201404181806_0007_m_000001_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.flushDictionaries(WordMapper.java:109) attempt_201404181806_0007_m_000001_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.cleanup(WordMapper.java:99) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268) attempt_201404181806_0007_m_000001_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201404181806_0007_m_000001_0: at javax.security.auth.Subject.doAs(Subject.java:415) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) attempt_201404181806_0007_m_000001_0: at org.apache.hadoop.mapred.Child.main(Child.java:262) attempt_201404181806_0007_m_000001_0: "VM Thread" prio=10 tid=0x00007f22dc068000 nid=0x3f55 runnable attempt_201404181806_0007_m_000001_0: "VM Periodic Task Thread" prio=10 tid=0x00007f22dc09e800 nid=0x3f5d waiting on condition attempt_201404181806_0007_m_000001_0: JNI global references: 286 attempt_201404181806_0007_m_000001_0: Heap attempt_201404181806_0007_m_000001_0: def new generation total 18112K, used 14672K [0x00000000efe00000, 0x00000000f11a0000, 0x00000000f38a0000) attempt_201404181806_0007_m_000001_0: eden space 16128K, 88% used [0x00000000efe00000, 0x00000000f0bf11c8, 0x00000000f0dc0000) attempt_201404181806_0007_m_000001_0: from space 1984K, 19% used [0x00000000f0fb0000, 0x00000000f1013108, 0x00000000f11a0000) attempt_201404181806_0007_m_000001_0: to space 1984K, 0% used [0x00000000f0dc0000, 0x00000000f0dc0000, 0x00000000f0fb0000) attempt_201404181806_0007_m_000001_0: tenured generation total 88708K, used 57707K [0x00000000f38a0000, 0x00000000f8f41000, 0x00000000fae00000) attempt_201404181806_0007_m_000001_0: the space 88708K, 65% used [0x00000000f38a0000, 0x00000000f70fae38, 0x00000000f70fb000, 0x00000000f8f41000) attempt_201404181806_0007_m_000001_0: compacting perm gen total 21248K, used 20414K [0x00000000fae00000, 0x00000000fc2c0000, 0x0000000100000000) attempt_201404181806_0007_m_000001_0: the space 21248K, 96% used [0x00000000fae00000, 0x00000000fc1ef8a8, 0x00000000fc1efa00, 0x00000000fc2c0000) attempt_201404181806_0007_m_000001_0: No shared spaces configured. attempt_201404181806_0007_m_000001_0: SLF4J: Class path contains multiple SLF4J bindings. attempt_201404181806_0007_m_000001_0: SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201404181806_0007_m_000001_0: SLF4J: Found binding in [jar:file:/mnt/mapred/local/taskTracker/ubuntu/jobcache/job_201404181806_0007/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201404181806_0007_m_000001_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 14/04/18 19:02:09 INFO mapred.JobClient: Task Id : attempt_201404181806_0007_m_000004_0, Status : FAILED Task attempt_201404181806_0007_m_000004_0 failed to report status for 600 seconds. Killing! attempt_201404181806_0007_m_000004_0: 2014-04-18 19:02:02 attempt_201404181806_0007_m_000004_0: Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode): attempt_201404181806_0007_m_000004_0: "main-EventThread" daemon prio=10 tid=0x00007f97649dc000 nid=0x2f3b waiting on condition [0x00007f974366c000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (parking) attempt_201404181806_0007_m_000004_0: at sun.misc.Unsafe.park(Native Method) attempt_201404181806_0007_m_000004_0: - parking to wait for <0x00000000f0fb0078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494) attempt_201404181806_0007_m_000004_0: "main-SendThread(localhost:2181)" daemon prio=10 tid=0x00007f97649b7800 nid=0x2f3a waiting on condition [0x00007f974376d000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86) attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:940) attempt_201404181806_0007_m_000004_0: at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1000) attempt_201404181806_0007_m_000004_0: "SpillThread" daemon prio=10 tid=0x00007f97641f7800 nid=0x2f38 waiting on condition [0x00007f9743af9000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (parking) attempt_201404181806_0007_m_000004_0: at sun.misc.Unsafe.park(Native Method) attempt_201404181806_0007_m_000004_0: - parking to wait for <0x00000000f3ca1f48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1218) attempt_201404181806_0007_m_000004_0: "org.apache.hadoop.hdfs.PeerCache@1f472be1" daemon prio=10 tid=0x00007f9764935800 nid=0x2f37 waiting on condition [0x00007f9743bfa000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:252) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:39) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:135) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.run(Thread.java:745) attempt_201404181806_0007_m_000004_0: "communication thread" daemon prio=10 tid=0x00007f97648b7800 nid=0x2f2e in Object.wait() [0x00007f9743dfc000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (on object monitor) attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3c65328> (a java.lang.Object) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:659) attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3c65328> (a java.lang.Object) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.run(Thread.java:745) attempt_201404181806_0007_m_000004_0: "Timer thread for monitoring jvm" daemon prio=10 tid=0x00007f97640e6000 nid=0x2f2d in Object.wait() [0x00007f9743efd000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (on object monitor) attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3bb3508> (a java.util.TaskQueue) attempt_201404181806_0007_m_000004_0: at java.util.TimerThread.mainLoop(Timer.java:552) attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3bb3508> (a java.util.TaskQueue) attempt_201404181806_0007_m_000004_0: at java.util.TimerThread.run(Timer.java:505) attempt_201404181806_0007_m_000004_0: "IPC Parameter Sending Thread #0" daemon prio=10 tid=0x00007f97647d6800 nid=0x2f2c waiting on condition [0x00007f9743ffe000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (parking) attempt_201404181806_0007_m_000004_0: at sun.misc.Unsafe.park(Native Method) attempt_201404181806_0007_m_000004_0: - parking to wait for <0x00000000f3b0ee00> (a java.util.concurrent.SynchronousQueue$TransferStack) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.run(Thread.java:745) attempt_201404181806_0007_m_000004_0: "IPC Client (1294629661) connection to /127.0.0.1:48262 from job_201404181806_0007" daemon prio=10 tid=0x00007f97647d8800 nid=0x2f2b in Object.wait() [0x00007f9760123000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (on object monitor) attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3b0ee90> (a org.apache.hadoop.ipc.Client$Connection) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:803) attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3b0ee90> (a org.apache.hadoop.ipc.Client$Connection) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.ipc.Client$Connection.run(Client.java:846) attempt_201404181806_0007_m_000004_0: "Thread for syncLogs" daemon prio=10 tid=0x00007f97647a3800 nid=0x2f2a waiting on condition [0x00007f9760224000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Child$3.run(Child.java:156) attempt_201404181806_0007_m_000004_0: "Service Thread" daemon prio=10 tid=0x00007f9764093000 nid=0x2f1b runnable [0x0000000000000000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000004_0: "C2 CompilerThread1" daemon prio=10 tid=0x00007f9764090800 nid=0x2f1a waiting on condition [0x0000000000000000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000004_0: "C2 CompilerThread0" daemon prio=10 tid=0x00007f976408e000 nid=0x2f19 waiting on condition [0x0000000000000000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000004_0: "Signal Dispatcher" daemon prio=10 tid=0x00007f9764084000 nid=0x2f18 waiting on condition [0x0000000000000000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: RUNNABLE attempt_201404181806_0007_m_000004_0: "Finalizer" daemon prio=10 tid=0x00007f976406e800 nid=0x2f17 in Object.wait() [0x00007f9768f46000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (on object monitor) attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3981c20> (a java.lang.ref.ReferenceQueue$Lock) attempt_201404181806_0007_m_000004_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3981c20> (a java.lang.ref.ReferenceQueue$Lock) attempt_201404181806_0007_m_000004_0: at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) attempt_201404181806_0007_m_000004_0: at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189) attempt_201404181806_0007_m_000004_0: "Reference Handler" daemon prio=10 tid=0x00007f976406a800 nid=0x2f16 in Object.wait() [0x00007f9769047000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: WAITING (on object monitor) attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Native Method) attempt_201404181806_0007_m_000004_0: - waiting on <0x00000000f3981cb8> (a java.lang.ref.Reference$Lock) attempt_201404181806_0007_m_000004_0: at java.lang.Object.wait(Object.java:503) attempt_201404181806_0007_m_000004_0: at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) attempt_201404181806_0007_m_000004_0: - locked <0x00000000f3981cb8> (a java.lang.ref.Reference$Lock) attempt_201404181806_0007_m_000004_0: "main" prio=10 tid=0x00007f9764012000 nid=0x2f14 waiting on condition [0x00007f976c30f000] attempt_201404181806_0007_m_000004_0: java.lang.Thread.State: TIMED_WAITING (sleeping) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Native Method) attempt_201404181806_0007_m_000004_0: at java.lang.Thread.sleep(Thread.java:340) attempt_201404181806_0007_m_000004_0: at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:360) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:54) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:185) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.checkIfBaseNodeAvailable(ZooKeeperNodeTracker.java:208) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:77) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:986) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:997) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:288) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:166) attempt_201404181806_0007_m_000004_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.flushDictionaries(WordMapper.java:109) attempt_201404181806_0007_m_000004_0: at com.mycompany.myprog.mapreduce.matchwords.WordMapper.cleanup(WordMapper.java:99) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268) attempt_201404181806_0007_m_000004_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201404181806_0007_m_000004_0: at javax.security.auth.Subject.doAs(Subject.java:415) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) attempt_201404181806_0007_m_000004_0: at org.apache.hadoop.mapred.Child.main(Child.java:262) attempt_201404181806_0007_m_000004_0: "VM Thread" prio=10 tid=0x00007f9764068000 nid=0x2f15 runnable attempt_201404181806_0007_m_000004_0: "VM Periodic Task Thread" prio=10 tid=0x00007f976409e800 nid=0x2f1c waiting on condition attempt_201404181806_0007_m_000004_0: JNI global references: 282 attempt_201404181806_0007_m_000004_0: Heap attempt_201404181806_0007_m_000004_0: def new generation total 18112K, used 12448K [0x00000000efe00000, 0x00000000f11a0000, 0x00000000f38a0000) attempt_201404181806_0007_m_000004_0: eden space 16128K, 75% used [0x00000000efe00000, 0x00000000f09d7628, 0x00000000f0dc0000) attempt_201404181806_0007_m_000004_0: from space 1984K, 16% used [0x00000000f0fb0000, 0x00000000f1000c58, 0x00000000f11a0000) attempt_201404181806_0007_m_000004_0: to space 1984K, 0% used [0x00000000f0dc0000, 0x00000000f0dc0000, 0x00000000f0fb0000) attempt_201404181806_0007_m_000004_0: tenured generation total 88708K, used 57600K [0x00000000f38a0000, 0x00000000f8f41000, 0x00000000fae00000) attempt_201404181806_0007_m_000004_0: the space 88708K, 64% used [0x00000000f38a0000, 0x00000000f70e0060, 0x00000000f70e0200, 0x00000000f8f41000) attempt_201404181806_0007_m_000004_0: compacting perm gen total 21248K, used 20413K [0x00000000fae00000, 0x00000000fc2c0000, 0x0000000100000000) attempt_201404181806_0007_m_000004_0: the space 21248K, 96% used [0x00000000fae00000, 0x00000000fc1ef590, 0x00000000fc1ef600, 0x00000000fc2c0000) attempt_201404181806_0007_m_000004_0: No shared spaces configured. attempt_201404181806_0007_m_000004_0: SLF4J: Class path contains multiple SLF4J bindings. attempt_201404181806_0007_m_000004_0: SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201404181806_0007_m_000004_0: SLF4J: Found binding in [jar:file:/mnt/mapred/local/taskTracker/ubuntu/jobcache/job_201404181806_0007/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201404181806_0007_m_000004_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
... View more
04-18-2014
10:28 AM
Hi, I am trying to collect all the Hadoop, HBase and ZooKeeper logs from each node in my cluster to some central location, lets say my local machine. What is the best way to achieve this? Does Cloudera or Cloudera Manager offer any such facilty? Thanks.
... View more
04-16-2014
04:19 PM
Hi Darren, Looks like it is picking up JDK1.6 + locate_cdh_java_home
+ '[' -z '' ']'
+ '[' -z /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/bigtop-utils ']'
+ local BIGTOP_DETECT_JAVAHOME=
+ for candidate in '"${JSVC_HOME}"' '"${JSVC_HOME}/.."' '"/usr/lib/bigtop-utils"' '"/usr/libexec"'
+ '[' -e /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/bigtop-utils/bigtop-detect-javahome ']'
+ BIGTOP_DETECT_JAVAHOME=/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/bigtop-utils/bigtop-detect-javahome
+ break
+ '[' -z /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/bigtop-utils/bigtop-detect-javahome ']'
+ . /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/bigtop-utils/bigtop-detect-javahome
++ '[' -r /etc/default/bigtop-utils ']'
++ JAVA6_HOME_CANDIDATES='\
/usr/lib/j2sdk1.6-sun \
/usr/lib/jvm/java-6-sun \
/usr/lib/jvm/java-1.6.0-sun-1.6.0.* \
/usr/lib/jvm/java-1.6.0-sun-1.6.0.*/jre/ \
/usr/lib/jvm/j2sdk1.6-oracle \
/usr/lib/jvm/j2sdk1.6-oracle/jre \
/usr/java/jdk1.6* \
/usr/java/jre1.6*'
++ OPENJAVA6_HOME_CANDIDATES='\
/usr/lib/jvm/java-1.6.0-openjdk \
/usr/lib/jvm/java-1.6.0-openjdk-* \
/usr/lib/jvm/jre-1.6.0-openjdk*'
++ JAVA7_HOME_CANDIDATES='\
/usr/java/jdk1.7* \
/usr/java/jre1.7* \
/usr/lib/jvm/j2sdk1.7-oracle \
/usr/lib/jvm/j2sdk1.7-oracle/jre \
/usr/lib/jvm/java-7-oracle*'
++ OPENJAVA7_HOME_CANDIDATES='\
/usr/lib/jvm/java-1.7.0-openjdk* \
/usr/lib/jvm/java-7-openjdk*'
++ MISCJAVA_HOME_CANDIDATES='\
/Library/Java/Home \
/usr/java/default \
/usr/lib/jvm/default-java \
/usr/lib/jvm/java-openjdk \
/usr/lib/jvm/jre-openjdk'
++ case $BIGTOP_JAVA_MAJOR in
++ JAVA_HOME_CANDIDATES='\
/usr/lib/j2sdk1.6-sun \
/usr/lib/jvm/java-6-sun \
/usr/lib/jvm/java-1.6.0-sun-1.6.0.* \
/usr/lib/jvm/java-1.6.0-sun-1.6.0.*/jre/ \
/usr/lib/jvm/j2sdk1.6-oracle \
/usr/lib/jvm/j2sdk1.6-oracle/jre \
/usr/java/jdk1.6* \
/usr/java/jre1.6* \
/usr/java/jdk1.7* \
/usr/java/jre1.7* \
/usr/lib/jvm/j2sdk1.7-oracle \
/usr/lib/jvm/j2sdk1.7-oracle/jre \
/usr/lib/jvm/java-7-oracle* \
/Library/Java/Home \
/usr/java/default \
/usr/lib/jvm/default-java \
/usr/lib/jvm/java-openjdk \
/usr/lib/jvm/jre-openjdk \
/usr/lib/jvm/java-1.7.0-openjdk* \
/usr/lib/jvm/java-7-openjdk* \
/usr/lib/jvm/java-1.6.0-openjdk \
/usr/lib/jvm/java-1.6.0-openjdk-* \
/usr/lib/jvm/jre-1.6.0-openjdk*'
++ '[' -z '' ']'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd '\'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd /usr/lib/j2sdk1.6-sun
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd '\'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd /usr/lib/jvm/java-6-sun
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd '\'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd '/usr/lib/jvm/java-1.6.0-sun-1.6.0.*'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd '\'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd '/usr/lib/jvm/java-1.6.0-sun-1.6.0.*/jre/'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd '\'
++ for candidate_regex in '$JAVA_HOME_CANDIDATES'
+++ ls -rd /usr/lib/jvm/j2sdk1.6-oracle
++ for candidate in '`ls -rd $candidate_regex 2>/dev/null`'
++ '[' -e /usr/lib/jvm/j2sdk1.6-oracle/bin/java ']'
++ export JAVA_HOME=/usr/lib/jvm/j2sdk1.6-oracle
++ JAVA_HOME=/usr/lib/jvm/j2sdk1.6-oracle
++ break 2
+ verify_java_home
+ '[' -z /usr/lib/jvm/j2sdk1.6-oracle ']'
+ echo JAVA_HOME=/usr/lib/jvm/j2sdk1.6-oracle
+ . /usr/lib/cmf/service/common/cdh-default-hadoop I notice that the other JDK (1.7), is not even listed in the options it is searching for. Also, when I try to list the installed JVMs, 1.7 is not listed, but it is there: ubuntu@ip-10-36-46-92:~$ update-java-alternatives -l j2sdk1.6-oracle 315 /usr/lib/jvm/j2sdk1.6-oracle ubuntu@ip-10-36-46-92:~$ ls -ll /usr/lib/jvm/ total 8 drwxr-xr-x 11 root root 4096 Apr 16 18:30 j2sdk1.6-oracle drwxr-xr-x 8 root root 4096 Apr 16 18:23 java-7-oracle-cloudera Thanks, Bhushan
... View more
04-16-2014
01:11 PM
Hi Darren, I changed the Java Home from CM: I am still getting error when I try to run my Hadoop job complied on JDK1.7: Exception in thread "main" java.lang.UnsupportedClassVersionError: com/mycomp/control/Main : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:201) I also restarted the whole cluster after changing the property in CM. Not sure what I am missing.
... View more
04-16-2014
09:50 AM
So you mean if the Hadoop job requires JDK1.7, there is no way to run it directly? One must do some environment variable changes on each node?
... View more
04-15-2014
01:06 PM
Hi Darren, So, by " CDH 4 generally prefers JDK 6 " you mean it will pick up JDK1.6 and if my Hadoop job requires 1.7, the job will fail, right? In that case I guess I will have to start using CDH5?? Thanks, Bhushan
... View more
04-15-2014
12:00 PM
Hi Darren, I tried CM5 and it installed JDK 1.7 while installing the CM. But during the cluster installation, it installed both JDK 1.6 and 1.7 on each node. First it installed 1.6 and then 1.7. (I selected CDH 4.x) After the installtion if I check the java version on each node, it is showing only JDK1.6 root@ip-10-226-176-29:/home/ubuntu# java -version java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) ubuntu@ip-10-36-39-11:~$ java -version java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Where is JDK 1.7? Did I do something wrong? Thanks, Bhushan
... View more
04-15-2014
10:57 AM
Hello, I am using Cloudera Manager to install CDH4 on an Amazon-EC2 cluster. I am using the latest version of CM, this is the command I am using to get the installation directory: $ wget http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin During cluster installtion using CM, it installs jdk1.6. My question is, is there a way to install jdk 1.7 by default? Is there a version of CM which does that? Or is there a way do this using latest CM version? I have already written a script which can replace jdk1.6 with jdk1.7, but I am hoping to find a simpler solution. Thanks.
... View more
04-15-2014
10:39 AM
Hi dlo, Thanks for the reply. I have solved the problem by passing the hbase quorum property as a command line argument to the hadoop command. But this is still good info, as I am new and trying to learn about Cloudera Manager and HBase/ZooKeeper. Thanks, Bhushan
... View more
04-11-2014
09:50 AM
@Clint: Thank you so much for the reply. I am pretty new to HBase/ZooKeeper world. Can you please explain " Try deploying client configs for HBase to the cluster and any node where you are running MR jobs from and this should help. " in a bit more detail? Thanks again.
... View more
04-02-2014
04:31 PM
Hi, I have recently started working on HBase, Hadoop and ZooKeeper. I am able to set up a 20-node cluster on Amazon-EC2 using Cloudera Manager. Also, I have installed Hadoop, HBase, MapReduce and ZooKeeper on the cluster, using CM. Now, I am trying to run a map-reduce job on it. Before running the job, if I start a ZooKeeper instance on EACH node (i.e. 20 instances), the job runs fine. But I get a warning from CM that you should not run ZK on more than 5 nodes. But if I run ZK on only 5 nodes out of 20, then the job hangs in the reduce phase forever. And I see the following error in tasktracker logs: 2014-04-02 22:31:36,467 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:124)
at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:83)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:986)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:997)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1099)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:288)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:166)
at com.ancestry.jermline.mapreduce.matchwords.WordMapper.flushDictionaries(WordMapper.java:109)
at com.ancestry.jermline.mapreduce.matchwords.WordMapper.cleanup(WordMapper.java:99)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2014-04-02 22:31:36,468 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2014-04-02 22:31:36,468 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-02 22:31:36,468 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
2014-04-02 22:31:37,569 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip6-localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-02 22:31:37,569 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
2014-04-02 22:31:37,670 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-02 22:31:37,671 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) I tried to google the error. And modified the /etc/hosts file on all the nodes, something like this: ubuntu@ip-10-254-140-2:~$ cat /etc/hosts #127.0.0.1 localhost 10.254.140.2 ip-10-254-140-2.us-west-2.compute.internal # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts (Also tried with '127.0.0.1 localhost' uncommented). But it is not working. All other Hadoop, HBase, ZooKeeper settings are unchanged (except for the fact that by default only 1 ZK instance is there, now there are 5 instnaces of ZK). If you want me to share any of the config files, please let me know and I will update the description. Any help in much appreciated. Thanks.
... View more
03-31-2014
11:06 AM
I am pretty new to Cloudera and Big Data world. So please pardon me if it is a very naive question. I am using Cloudera Manager to install CDH4 on a 20-node Amazon-EC2 cluster. When the installtion of " Cloudera Standard (Free license) " begins (i.e. "Cluster Installtion"), I am noticing that it is being done in only-10-nodes-at-a-time way. Meaning, there is installtion progress for only 10 nodes at a time. When the first node is done, then 11th starts, when 2nd node is done, then 12th starts and so on. For 20 nodes, it is still fine. But soon we are going to use 500 nodes for production use. Is there a way to install Cloudera on the entire cluster at once? Pleas let me know if any more info is needed from me. Thanks.
... View more