About doncucumber

doncucumber · ‎09-22-2021

Note: Why JN appears as good and running on CL if it had this problems? Now I'm possitive JN 2 was with problems and as soon as the server that contains JN 3 died, the whole cluster was doomed. If CM would tell JN 2 was wrong time ago, I'd work on it and solve it, so if JN 3 die, nothing would happen to the cluster, right?

doncucumber · ‎09-22-2021

As soon as I see the logs you asked for, I realized where the problem was on JN 2 and 3. Yesterday I did the exact procedure between JN 2 and 3 without looking at those logs. If I would, I'd copy files from 1 to 2 and 3 and the problem would be long gone. That's why you need to sleep. After so many hours working straight on this, I skip that simple and important step. Thanks for your help.

doncucumber · ‎09-22-2021

There are three JN: The first doesn't look that bad.. the other two, really bad: JN 1: 2021-09-22 09:00:17,953 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 500 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler 2021-09-22 09:00:17,963 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8485 2021-09-22 09:00:18,001 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2021-09-22 09:00:18,002 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8485: starting 2021-09-22 09:00:20,121 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: Initializing journal in directory /datos3/dfs/jn/nameservice1 2021-09-22 09:00:20,158 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /datos3/dfs/jn/nameservice1/in_use.lock acquired by nodename 26126@ada-4.novalocal 2021-09-22 09:00:20,179 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Scanning storage FileJournalManager(root=/datos3/dfs/jn/nameservice1) 2021-09-22 09:00:20,331 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Latest log is EditLogFile(file=/datos3/dfs/jn/nameservice1/current/edits_inprogress_0000000000013015981,first=0000000000013015981,last=0000000000013061158,inProgress=true,hasCorruptHeader=false) 2021-09-22 09:00:20,338 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not construct Shared Edits Uri 2021-09-22 09:00:20,338 WARN org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Other JournalNode addresses not available. Journal Syncing cannot be done JN 2: 2021-09-22 09:00:51,400 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Caught exception after scanning through 0 ops from /datos3/dfs/jn/nameservice1/current/edits_inprogress_0000000000010993186 while determining its valid length. Position was 1048576 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:5264) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:243) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1248) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:327) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:566) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:210) at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:162) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:145) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:216) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:228) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27411) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) 2021-09-22 09:00:51,401 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: After resync, position is 1048576 JN 3: 2021-09-22 09:02:47,518 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Caught exception after scanning through 0 ops from /datos3/dfs/jn/nameservice1/current/edits_inprogress_0000000000010993186 while determining its valid length. Position was 1048576 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:5264) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:243) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1248) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:327) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:566) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:210) at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:162) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:145) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:216) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:228) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27411) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) 2021-09-22 09:02:47,518 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: After resync, position is 1048576 in JN, after that last line, nothing else were logged until I stopped the service.

doncucumber · ‎09-21-2021

Two night ago a server that contains iSCSI/lvm volumes had a crash. that server had three volumes from one Cloudera node that has HDFS NameNode, DataNode and JournalNode roles, beside other Cloudera/Hadoop roles. Those volumes contain all the data from all the roles that node has. I figure it out how to solve the volume problem and Kudu went Up (the data is in Kudu). But HDFS didn't, so Impala and other services are not running either. I've try a lot of possible solutions with no luck at all. Now CM says all three journal nodes are up, but none of the NameNodes are up because they can't connect to the Journal. This is the error I see: 2021-09-21 11:28:04,945 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [10.241.109.57:8485, 10.241.109.81:8485, 10.241.109.76:8485], stream=null)) java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:197) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436) at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:616) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:613) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1603) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1210) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1898) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1756) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1700) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) 2021-09-21 11:28:04,949 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [10.241.109.57:8485, 10.241.109.81:8485, 10.241.109.76:8485], stream=null)) 2021-09-21 11:28:04,952 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at ada-3.novalocal/10.241.109.51 ************************************************************/ Any ideas on how to solve it?

doncucumber · ‎12-23-2019

Hi @Harsh J , I just deleted around 80% of my data with "DELETE from table_name where register <= '2018-12-31'" My disks are pretty full (around 90%). After the deletion nothing happened (about freeing space). I restart Cloudera (Kudu, Impala, HDFS, etc.) and nothing. I add this two lines to Kudu configuration (in "Master Advanced Configuration Snippet (Safety Valve) for gflagfile" and "Tablet Server Advanced Configuration Snippet (Safety Valve) for gflagfile"): ``` unlock_experimental_flags=true flush_threshold_secs=120 ``` After restart Kudu, wait for the 120 secs.. nothing.

Online	Offline
Last Visited	‎09-22-2021 12:24 PM

Member Since	‎12-23-2019 06:22 PM
Last Visited	‎09-22-2021 12:24 PM
Posts	7

Cloudera Community

Re: NameNode won't start because can't connect Jou...

Re: NameNode won't start because can't connect Jou...

Re: NameNode won't start because can't connect Jou...

NameNode won't start because can't connect Journal...

Re: Kudu - deleting data