Member since
12-23-2019
7
Posts
0
Kudos Received
0
Solutions
09-22-2021
06:05 AM
Note: Why JN appears as good and running on CL if it had this problems? Now I'm possitive JN 2 was with problems and as soon as the server that contains JN 3 died, the whole cluster was doomed. If CM would tell JN 2 was wrong time ago, I'd work on it and solve it, so if JN 3 die, nothing would happen to the cluster, right?
... View more
09-22-2021
06:00 AM
As soon as I see the logs you asked for, I realized where the problem was on JN 2 and 3. Yesterday I did the exact procedure between JN 2 and 3 without looking at those logs. If I would, I'd copy files from 1 to 2 and 3 and the problem would be long gone. That's why you need to sleep. After so many hours working straight on this, I skip that simple and important step. Thanks for your help.
... View more
09-22-2021
05:11 AM
There are three JN: The first doesn't look that bad.. the other two, really bad: JN 1: 2021-09-22 09:00:17,953 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 500 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2021-09-22 09:00:17,963 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8485
2021-09-22 09:00:18,001 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2021-09-22 09:00:18,002 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8485: starting
2021-09-22 09:00:20,121 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: Initializing journal in directory /datos3/dfs/jn/nameservice1
2021-09-22 09:00:20,158 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /datos3/dfs/jn/nameservice1/in_use.lock acquired by nodename 26126@ada-4.novalocal
2021-09-22 09:00:20,179 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Scanning storage FileJournalManager(root=/datos3/dfs/jn/nameservice1)
2021-09-22 09:00:20,331 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Latest log is EditLogFile(file=/datos3/dfs/jn/nameservice1/current/edits_inprogress_0000000000013015981,first=0000000000013015981,last=0000000000013061158,inProgress=true,hasCorruptHeader=false)
2021-09-22 09:00:20,338 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not construct Shared Edits Uri
2021-09-22 09:00:20,338 WARN org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Other JournalNode addresses not available. Journal Syncing cannot be done JN 2: 2021-09-22 09:00:51,400 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Caught exception after scanning through 0 ops from /datos3/dfs/jn/nameservice1/current/edits_inprogress_0000000000010993186 while determining its valid length. Position was 1048576
java.io.IOException: Can't scan a pre-transactional edit log.
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:5264)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:243)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1248)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:327)
at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:566)
at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:210)
at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:145)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:216)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:228)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27411)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
2021-09-22 09:00:51,401 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: After resync, position is 1048576 JN 3: 2021-09-22 09:02:47,518 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Caught exception after scanning through 0 ops from /datos3/dfs/jn/nameservice1/current/edits_inprogress_0000000000010993186 while determining its valid length. Position was 1048576
java.io.IOException: Can't scan a pre-transactional edit log.
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:5264)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:243)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1248)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:327)
at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:566)
at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:210)
at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:145)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:216)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:228)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27411)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
2021-09-22 09:02:47,518 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: After resync, position is 1048576 in JN, after that last line, nothing else were logged until I stopped the service.
... View more
09-21-2021
09:47 AM
Two night ago a server that contains iSCSI/lvm volumes had a crash. that server had three volumes from one Cloudera node that has HDFS NameNode, DataNode and JournalNode roles, beside other Cloudera/Hadoop roles. Those volumes contain all the data from all the roles that node has. I figure it out how to solve the volume problem and Kudu went Up (the data is in Kudu). But HDFS didn't, so Impala and other services are not running either. I've try a lot of possible solutions with no luck at all. Now CM says all three journal nodes are up, but none of the NameNodes are up because they can't connect to the Journal. This is the error I see: 2021-09-21 11:28:04,945 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [10.241.109.57:8485, 10.241.109.81:8485, 10.241.109.76:8485], stream=null))
java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:197)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:616)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:613)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1603)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1210)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1898)
at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1756)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1700)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
2021-09-21 11:28:04,949 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [10.241.109.57:8485, 10.241.109.81:8485, 10.241.109.76:8485], stream=null))
2021-09-21 11:28:04,952 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ada-3.novalocal/10.241.109.51
************************************************************/ Any ideas on how to solve it?
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
-
HDFS
12-23-2019
06:36 PM
Hi @Harsh J , I just deleted around 80% of my data with "DELETE from table_name where register <= '2018-12-31'" My disks are pretty full (around 90%). After the deletion nothing happened (about freeing space). I restart Cloudera (Kudu, Impala, HDFS, etc.) and nothing. I add this two lines to Kudu configuration (in "Master Advanced Configuration Snippet (Safety Valve) for gflagfile" and "Tablet Server Advanced Configuration Snippet (Safety Valve) for gflagfile"): ``` unlock_experimental_flags=true flush_threshold_secs=120 ``` After restart Kudu, wait for the 120 secs.. nothing.
... View more