Created 06-08-2017 11:36 AM
help me out in solving this issue @Jay SenSharma
Created 06-08-2017 11:39 AM
Which files have you deleted?
What kind of errors do you see in the NameNode logs?
What is the NameNode current Heap & Approximate number of files in HDFS?
Created 06-08-2017 11:59 AM
1) I have deleted the logs which are not related to hadoop.
/var/log/hadoop/hdfs/hadoop-hdfs-journalnode-chtcuxhd06.log
2017-06-08 14:51:48,351 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/Custer/current/edits_inprogress_0000000000007705179 while determining its valid length. Position was 1007616
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-chtcuxhd06.log
2017-05-10 17:13:43,375 ERROR datanode.DataNode (DataXceiver.java:run(278)) - chtcuxhd06:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_FDS operation src: unix:/var/lib/hadoop-hdfs/dn_socket dst: <local>
3)
root@chtcuxhd07 log]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 32G 31G 1.4G 96% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 84K 16G 1% /dev/shm tmpfs 16G 747M 15G 5% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda1 485M 153M 332M 32% /boot tmpfs 3.2G 16K 3.2G 1% /run/user/42 tmpfs 3.2G 0 3.2G 0% /run/user/1001 tmpfs 3.2G 0 3.2G 0% /run/user/0 /dev/sdb1 1.2T 818G 383G 69% /data
Created 06-08-2017 12:11 PM
Looks like 'edits_inprogress_<CURRENT_TRANS_ID>' might be corrupted.
So can you try to move the "edits_inprogress" file to "/tmp" (or some other backup directory) and restart JournalNode and NameNode services.
.
Created 06-08-2017 01:54 PM
ZKFailoverController is getting failed and showing the same error after moving the "edits_inprogress" file to "/tmp" (or some other backup directory) and restart JournalNode and NameNode services.
2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 1024000 2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/Custer/current/edits_inprogress_0000000000007906637 while determining its valid length. Position was 1024000 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4959) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192) at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:152) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:189) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:224) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) 2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 1024000 2017-06-08 19:06:36,319 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/Custer/current/edits_inprogress_0000000000007906637 while determining its valid length. Position was 1024000 java.io.IOException: Can't scan a pre-transactional edit log.
Created 06-08-2017 02:20 PM
Your error indicates edit log corruption so the previously mentioned steps should fix it if Any of the Journal Node has the correct image present in it. Can you please double check the steps as following : https://community.hortonworks.com/questions/62335/journal-node-edit-log-issue.html
If this is happening on a single JournalNode then you can try the following:
Created 06-09-2017 05:11 AM
sudo -u hdfs hdfs haadmin -getServiceState nn1
echo Y |sudo -u hdfs hdfs haadmin -transitionToStandby --forcemanual nn2 echo Y |sudo -u hdfs hdfs haadmin -transitionToActive --forcemanual nn1