Member since
03-14-2016
67
Posts
29
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1812 | 09-21-2018 10:02 AM | |
3264 | 09-11-2018 10:44 AM | |
3576 | 07-06-2016 01:14 PM |
09-06-2018
07:28 AM
You can do tail in namenode and datanode log, also you can redirect output to dummy log file during restart. #tailf <namenode log> >/tmp/namenode-`hostname`.log #tailf <datanode log> >/tmp/datanode-`hostname`.log
... View more
09-06-2018
07:12 AM
Thanks for the confirmation. I need namenode and datanode log after HDFS service restart.
... View more
09-06-2018
07:07 AM
@Muthukumar Somasundaram Namenode will be in safemode until it receives the specified percentage(dfs.namenode.safemode.threshold-pct=0.999f) of blocks that should satisfy minimal replication and it should be reported to namenode. In your case, Namenode still waiting for block report from datanodes. Please ensure that all datanodes are up and running, and check if datanode is sending block report. Addition, Check how many blocks so far reported to namenode? ie. The reported blocks 71 needs additional 17 blocks to reach the threshold 1.0000 of total blocks 87.
... View more
09-06-2018
06:48 AM
@Michael Bronson Looks good to me. Just do one more check, what was the config that getting loaded into NN in-memory? http://<active nn host>:50070/conf and find it "dfs.datanode.data.dir". You must share us the logs. No point in going with assumptions. 🙂
... View more
09-05-2018
07:10 PM
As I said earlier, It's hard to tell you the exact cause without reviewing namenode and datanode log regarding disk registration. As we see in the UI that configured capacity is 154 GB which means it registered only two disks from each datanode. If you don't have any concern then share your logs which should be after the service restart. I still waiting for your reply to my previous question, Did you validated in local machine, hdfs-site.xml without ambari? # grep dfs.datanode.data.dir -A1 /etc/hadoop/conf/hdfs-site.xml
... View more
09-05-2018
04:26 PM
Ok. Share you namenode and one of the datanode log after service restart. Did you validate in the local machine, hdfs-site.xml without ambari? # grep dfs.datanode.data.dir -A1 /etc/hadoop/conf/hdfs-site.xml
... View more
09-05-2018
03:16 PM
Did you restarted HDFS after adding disks? It's hard to tell you exact cause without analysis namenode and datanode log. If possible attach namenode and one of the datanode log which must be after service restart. We can validate the disk registration in HDFS. Also /etc/hadoop/conf.
... View more
09-05-2018
02:52 PM
@Michael Bronson As I said earlier that your configured capacity is 154 GB, not 320 GB. This can be seen in NN UI. http://<active namenode host>:50070/dfshealth.html#tab-overview You must check "dfs.datanode.data.dir", the no of disks that are configured for HDFS. It looks you configured only two disks. Ambari -HDFS -> Config -> Settings -> DataNode directories You must do HDFS restart if you have not done after commissioning disks.
... View more
09-05-2018
02:11 PM
Ambari clearly shows, total configured capacity is 152 GB. But you need to double check from Namenode UI. Share namenode screenshot, http://<active namenode host>:50070/dfshealth.html#tab-overview http://<active namenode host>:50070/dfshealth.html#tab-datanode http://<active namenode host>:50070/dfshealth.html#tab-datanode-volume-failures Also, attach active namenode log and one of the datanode log after service restart. We have to find what are the disks are getting registered during startup. Can you get /etc/hadoop/conf/hdfs-site.xml?
... View more
09-05-2018
12:53 PM
6 Kudos
Missing Block Mark missing if all of the block replicas of that file is not reported to Namenode. Corrupt Block Mark corrupt if all of the block replicas of that file is corrupted (Or) none of them are reported to Namenode. The checklist must be done before you confirm block is corrupted/missing. Check if all datanodes are running in the cluster Check if you see dead datanodes Check if disk failure from multiple datanode Check if disk out of space from multiple datanode Check if block report is rejected by namenode (It can be seen from namenode log as a warning/error) Check if you changed any config groups Check if block physically exists in local filesystem or removed by users unknowingly. Ex: "find <dfs.datanode.data.dir> -type f -iname <blkid>*". Repeat the same step in all datanodes Check if too many blocks hosted in a single datanode Check if block report fails with "exceeding max RPC size", default 64 MB. You can see this warning from namenode log "Protocol message was too large. May be malicious" Check if mount point is unmounted because of filesystem failure Check if block is written into root volume because of disk auto unmount. Data might be hidden if you remount the filesystem on top of existing datanode dir. Note: You will lose data if you run "hdfs fsck / -delete". Please ensure you have done all checklist
... View more
Labels: