Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1921 | 06-15-2020 05:23 AM | |
| 15482 | 01-30-2020 08:04 PM | |
| 2074 | 07-07-2019 09:06 PM | |
| 8121 | 01-27-2018 10:17 PM | |
| 4571 | 12-31-2017 10:12 PM |
12-04-2017
10:25 PM
[zk: localhost:2181(CONNECTED) 6] ls /hadoop-ha/hdfsha/ActiveStandbyElectorLock
Node does not exist: /hadoop-ha/hdfsha/ActiveStandbyElectorLock
[zk : localhost:2181(CONNECTED) 7] get /hadoop-ha/hdfsha/ActiveStandbyElectorLock Node does not exist: /hadoop-ha/hdfsha/ActiveStandbyElectorLock
[zk: localhost:2181(CONNECTED) 8] ls /hadoop-ha/hdfsha []
[zk: localhost:2181(CONNECTED) 9]
... View more
12-04-2017
10:23 PM
regarding to the host file , we not use it , we have DNS server , and all hosts are resolved . we already check that , and all ip's point to the right hostnames
... View more
12-04-2017
10:14 PM
my feeling is that no mater which namenode we start , every namenode became to standby and this is the big problem
... View more
12-04-2017
10:07 PM
I do it on the first machine - master01 ( this is standby machine ) localhost:2181(CONNECTED) 2] ls /hadoop-ha
[hdfsha]
... View more
12-04-2017
07:00 PM
we start the services in our ambari cluster as the following ( after reboot ) 1. start Zk 2. start journal-node 3. start name node ( on master01 machine and on master02 machine ) and we noticed that both name-node are stand by how to force on of the node to became active ? from log: tail -200 hadoop-hdfs-namenode-master03.sys65.com.log
rics to be sent will be discarded. This message will be skipped for the next 20 times.
2017-12-04 18:56:03,649 WARN namenode.FSEditLog (JournalSet.java:selectInputStreams(280)) - Unable to determine input streams from QJM to [152.87.28.153:8485, 152.87.28.152:8485, 152.87.27.162:8485]. Skipping.
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1590)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1614)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:251)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:402)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:355)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:372)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:476)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:368)
2017-12-04 18:56:03,650 INFO namenode.FSNamesystem (FSNamesystem.java:writeUnlock(1658)) - FSNamesystem write lock held for 20005 ms via
java.lang.Thread.getStackTrace(Thread.java:1556)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1658)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:285)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:402)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:355)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:372)
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:476)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:368)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 20005
2017-12-04 19:03:43,792 INFO ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(323)) - Triggering log roll on remote NameNode
2017-12-04 19:03:43,820 INFO ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(334)) - Skipping log roll. Remote node is not in Active state: Operation category JOURNAL is not supported in state standby
2017-12-04 19:03:49,824 INFO client.QuorumJournalManager (QuorumCall.java:waitFor(136)) - Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far:
2017-12-04 19:03:50,825 INFO client.QuorumJournalManager (QuorumCall.java:waitFor(136)) - Waited 7003 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far:
You have mail in /var/spool/mail/root
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
12-04-2017
04:36 PM
in our ambari cluster when we stop some services , then we sow that the first service progress is stoping but the other services are pending how to enable all stop services to be stop in parallel ?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
12-04-2017
02:25 PM
on the namenode -format we get - 17/12/04 14:23:34 ERROR namenode.NameNode: Failed to start namenode.
java.io.IOException: Timed out waiting for response from loggers
... View more
12-04-2017
12:46 PM
we see from the logs also this 017-12-04 12:37:33,544 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [100.14.28.153:8485, 100.14.28.152:8485, 100.14.27.162:8485], stream=null))
java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond.
... View more
12-04-2017
12:22 PM
@Geoffrey , according to my bad status , what else we can do , ( yesterday we restart all machines in the cluster , and start the services from Zk and HDFS and so on ) , so I am really stuck here
... View more
12-04-2017
12:15 PM
@Geoffrey , I already restart the HDFS but without good results after restart the status is as the current status , so I just thinking what are the next steps to resolve this problem ?
... View more