About kpalanisamy

kpalanisamy · ‎09-06-2018

You can do tail in namenode and datanode log, also you can redirect output to dummy log file during restart. #tailf <namenode log> >/tmp/namenode-`hostname`.log #tailf <datanode log> >/tmp/datanode-`hostname`.log

kpalanisamy · ‎09-06-2018

Thanks for the confirmation. I need namenode and datanode log after HDFS service restart.

kpalanisamy · ‎09-06-2018

@Muthukumar Somasundaram Namenode will be in safemode until it receives the specified percentage(dfs.namenode.safemode.threshold-pct=0.999f) of blocks that should satisfy minimal replication and it should be reported to namenode. In your case, Namenode still waiting for block report from datanodes. Please ensure that all datanodes are up and running, and check if datanode is sending block report. Addition, Check how many blocks so far reported to namenode? ie. The reported blocks 71 needs additional 17 blocks to reach the threshold 1.0000 of total blocks 87.

kpalanisamy · ‎09-06-2018

@Michael Bronson Looks good to me. Just do one more check, what was the config that getting loaded into NN in-memory? http://<active nn host>:50070/conf and find it "dfs.datanode.data.dir". You must share us the logs. No point in going with assumptions. 🙂

kpalanisamy · ‎09-05-2018

As I said earlier, It's hard to tell you the exact cause without reviewing namenode and datanode log regarding disk registration. As we see in the UI that configured capacity is 154 GB which means it registered only two disks from each datanode. If you don't have any concern then share your logs which should be after the service restart. I still waiting for your reply to my previous question, Did you validated in local machine, hdfs-site.xml without ambari? # grep dfs.datanode.data.dir -A1 /etc/hadoop/conf/hdfs-site.xml

kpalanisamy · ‎09-05-2018

Ok. Share you namenode and one of the datanode log after service restart. Did you validate in the local machine, hdfs-site.xml without ambari? # grep dfs.datanode.data.dir -A1 /etc/hadoop/conf/hdfs-site.xml

kpalanisamy · ‎09-05-2018

Did you restarted HDFS after adding disks? It's hard to tell you exact cause without analysis namenode and datanode log. If possible attach namenode and one of the datanode log which must be after service restart. We can validate the disk registration in HDFS. Also /etc/hadoop/conf.

kpalanisamy · ‎09-05-2018

@Michael Bronson As I said earlier that your configured capacity is 154 GB, not 320 GB. This can be seen in NN UI. http://<active namenode host>:50070/dfshealth.html#tab-overview You must check "dfs.datanode.data.dir", the no of disks that are configured for HDFS. It looks you configured only two disks. Ambari -HDFS -> Config -> Settings -> DataNode directories You must do HDFS restart if you have not done after commissioning disks.

kpalanisamy · ‎09-05-2018

Ambari clearly shows, total configured capacity is 152 GB. But you need to double check from Namenode UI. Share namenode screenshot, http://<active namenode host>:50070/dfshealth.html#tab-overview http://<active namenode host>:50070/dfshealth.html#tab-datanode http://<active namenode host>:50070/dfshealth.html#tab-datanode-volume-failures Also, attach active namenode log and one of the datanode log after service restart. We have to find what are the disks are getting registered during startup. Can you get /etc/hadoop/conf/hdfs-site.xml?

kpalanisamy · ‎09-05-2018

Missing Block Mark missing if all of the block replicas of that file is not reported to Namenode. Corrupt Block Mark corrupt if all of the block replicas of that file is corrupted (Or) none of them are reported to Namenode. The checklist must be done before you confirm block is corrupted/missing. Check if all datanodes are running in the cluster Check if you see dead datanodes Check if disk failure from multiple datanode Check if disk out of space from multiple datanode Check if block report is rejected by namenode (It can be seen from namenode log as a warning/error) Check if you changed any config groups Check if block physically exists in local filesystem or removed by users unknowingly. Ex: "find <dfs.datanode.data.dir> -type f -iname <blkid>*". Repeat the same step in all datanodes Check if too many blocks hosted in a single datanode Check if block report fails with "exceeding max RPC size", default 64 MB. You can see this warning from namenode log "Protocol message was too large. May be malicious" Check if mount point is unmounted because of filesystem failure Check if block is written into root volume because of disk auto unmount. Data might be hidden if you remount the filesystem on top of existing datanode dir. Note: You will lose data if you run "hdfs fsck / -delete". Please ensure you have done all checklist

Online	Offline
Last Visited	‎09-03-2025 11:57 AM

Member Since	‎03-14-2016 05:07 AM
Last Visited	‎09-03-2025 11:57 AM
Posts	67
Kudos received	29

Cloudera Community

Re: name node high availability on hadoop 1.x

Re: "NameNode Service RPC Queue Latency (Hourly)"...

Re: HBase JRuby program error "NoMethodError: unde...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Re: NameNode always coming up in safe mode after i...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

HDFS checklist for identifying missing/corrupt blo...