Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

Highlighted

As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

New Contributor

help me out in solving this issue @Jay SenSharma

6 REPLIES 6

Re: As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

Super Mentor

@kotesh banoth

Which files have you deleted?

What kind of errors do you see in the NameNode logs?

What is the NameNode current Heap & Approximate number of files in HDFS?

Re: As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

New Contributor

@Jay SenSharma

1) I have deleted the logs which are not related to hadoop.

2)issue-1.pngissue-2.png

/var/log/hadoop/hdfs/hadoop-hdfs-journalnode-chtcuxhd06.log

2017-06-08 14:51:48,351 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/Custer/current/edits_inprogress_0000000000007705179 while determining its valid length. Position was 1007616

/var/log/hadoop/hdfs/hadoop-hdfs-datanode-chtcuxhd06.log

2017-05-10 17:13:43,375 ERROR datanode.DataNode (DataXceiver.java:run(278)) - chtcuxhd06:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_FDS operation src: unix:/var/lib/hadoop-hdfs/dn_socket dst: <local>

3)

root@chtcuxhd07 log]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 32G 31G 1.4G 96% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 84K 16G 1% /dev/shm tmpfs 16G 747M 15G 5% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda1 485M 153M 332M 32% /boot tmpfs 3.2G 16K 3.2G 1% /run/user/42 tmpfs 3.2G 0 3.2G 0% /run/user/1001 tmpfs 3.2G 0 3.2G 0% /run/user/0 /dev/sdb1 1.2T 818G 383G 69% /data

Re: As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

Super Mentor

@kotesh banoth

Looks like 'edits_inprogress_<CURRENT_TRANS_ID>' might be corrupted.

So can you try to move the "edits_inprogress" file to "/tmp" (or some other backup directory) and restart JournalNode and NameNode services.

.

Re: As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

New Contributor
@Jay SenSharma

ZKFailoverController is getting failed and showing the same error after moving the "edits_inprogress" file to "/tmp" (or some other backup directory) and restart JournalNode and NameNode services.

2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 1024000 2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/Custer/current/edits_inprogress_0000000000007906637 while determining its valid length. Position was 1024000 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4959) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192) at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:152) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:189) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:224) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) 2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 1024000 2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/Custer/current/edits_inprogress_0000000000007906637 while determining its valid length. Position was 1024000 java.io.IOException: Can't scan a pre-transactional edit log.

Re: As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

Super Mentor

@kotesh banoth

Your error indicates edit log corruption so the previously mentioned steps should fix it if Any of the Journal Node has the correct image present in it. Can you please double check the steps as following : https://community.hortonworks.com/questions/62335/journal-node-edit-log-issue.html

If this is happening on a single JournalNode then you can try the following:

  1. As a precaution, stop HDFS. This will shut down all Journalnodes as well.
  2. On the node in question, move the fsimage edits directory ( /hadoop/hdfs/journal/Custer/current/) to an alternate location.
  3. Copy the fsimage edits directory ( /hadoop/hdfs/journal/Custer/current/) from a functioning JournalNode to this node.
  4. Start HDFS.

Re: As the memory is full in cluster (unwanted files deleted) , the name node is going to safe mode and nodes are going down after sometime of startup

New Contributor

By using the below commands i was able to bring cluster normal:

sudo -u hdfs hdfs haadmin -getServiceState nn1

echo Y |sudo -u hdfs hdfs haadmin -transitionToStandby --forcemanual nn2 echo Y |sudo -u hdfs hdfs haadmin -transitionToActive --forcemanual nn1