Support Questions

Trinity · ‎09-22-2014

Hi All,

I have installed CDH 5 (5.1.2) on a 2 node cluster on AWS VPC with ubuntu base OS, after completing my use I have shut the servers down.

When started again the manager was showing errors in starting HDFS and HBASE. The error showed on the HBASE was "HDFS Under replicated blocks". After having Googling I have found that the issue is with blocks "Missing Blocks / Corrupted Files" was the error shown there.

summary of the hadoop fsck /

Total size: 311766450 B
Total dirs: 656
Total files: 215
Total symlinks: 0
Total blocks (validated): 213 (avg. block size 1463692 B)
********************************
CORRUPT FILES: 105
MISSING BLOCKS: 105
MISSING SIZE: 118118945 B
CORRUPT BLOCKS: 105
********************************
Minimally replicated blocks: 108 (50.704224 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 43 (20.187794 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.0140845
Corrupt blocks: 105
Missing replicas: 43 (8.431373 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Mon Sep 22 08:06:08 UTC 2014 in 155 milliseconds

----------------------------------------------------------------------------------

The filesystem under path '/' is CORRUPT

I have followed the instructions in

http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs

https://www.packtpub.com/books/content/managing-hadoop-cluster (hadoop fsck -delete)

After executing the command (hadoop fsck -delete) the HBase started. While trying to start the HDFS it showing an error "HDFS error: could only be replicated to 0 nodes, instead of 1"

Please help me to fix this out.

My concenrs: Is it possible to shutdown the cluster after the usage?

If it is possible which are the configuration which we need to take care during the installation

Trinity · ‎10-08-2014

Hi Team,

I got a solution.

When we are selecting an instance with instance store for configuring CDH, the log files will be automatically stored to the instance store. While we stops the instance the data / logs in the instance store will be deleted and that results to showing error " Missing Blocks".

For avoiding this we need to remove instance store while launching instance or we need to change the log location to EBS volume manually after completing installation. I think its better to remove the instance store while launching the instance.

Thanks to all you..

Cheers!!!!

View solution in original post

GautamG · ‎09-22-2014

"HDFS Under replicated blocks" implies that some blocks are not duplicated
enough to satisfy the default replication factor of 3. If possible consider
setting up clusters with at least 3 nodes.

"Missing Blocks" implies the datanodes which had block before shutdown now
don't have it when they booted up. This could happen with the Instance
Store. What kind of storage did you use on the nodes? This is explained
here:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html

When you run "hadoop fsck -delete" you are telling the namenode to delete
files whose blocks cannot be located. This is fine for temporary files.
Before running it however you should run "hdfs fsck
-list-corruptfileblocks", identify the reason why the blocks are missing.
If the blocks are recoverable, you won't have to delete the files
themselves.

"could only be replicated to 0 nodes, instead of 1" could mean the
datanodes are not healthy. Check the datanode logs under
/var/log/hadoop-hdfs on both nodes to see what the problem might be.
If it's not clear, paste the relevant parts to pastebin and
give us the URL

Regards,
Gautam Gopalakrishnan

Trinity · ‎09-24-2014

Hey Gautam,

Thanks for the quick response. Please find my responses below.

"Missing Blocks" implies the datanodes which had block before shutdown now don't have it when they booted up. This could happen with the Instance Store. What kind of storage did you use on the nodes? This is explained here:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html

I have configured using EBS instead of instance store. We need to shut down the instance after the usage (since the application is not yet expossed to live). My working scenatio is

Configured a cluster with 2 nodes.
Shutdown them after completing my jobs on there.
Start once I want to do some changes in the cluster.

In my case, issue is once I start the cluster (instances) after 2 - 3 days (after shutting down) there will be showing the errors of missing blocks. Will shutdown and starting the server according to our use makes any issue in CDH5.x.x?

When you run "hadoop fsck -delete" you are telling the namenode to delete files whose blocks cannot be located. This is fine for temporary files. Before running it however you should run "hdfs fsck -list-corruptfileblocks", identify the reason why the blocks are missing. If the blocks are recoverable, you won't have to delete the files themselves.

Ok, but the HBASE wont comming up with out resolving this missing block issue. Is there any other method to fix this missing block?

"could only be replicated to 0 nodes, instead of 1" could mean the datanodes are not healthy. Check the datanode logs under /var/log/hadoop-hdfs on both nodes to see what the problem might be. If it's not clear, paste the relevant parts to pastebin and give us the URL

This happens after running the "hadoop fsck -delete" command

Trinity · ‎09-24-2014

Hi Gautam,

Thanks for your quick response, please find my answers below.

"HDFS Under replicated blocks" implies that some blocks are not duplicated enough to satisfy the default replication factor of 3. If possible consider setting up clusters with at least 3 nodes.

As of now our requirement donot need to have 3 nodes.

"Missing Blocks" implies the datanodes which had block before shutdown now don't have it when they booted up. This could happen with the Instance Store. What kind of storage did you use on the nodes? This is explained here:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html

We are configured the entire environment in EBS volumes. Our working scenario is,

Cluster with 2 nodes
After making changes we need to shutdown the instances (since the application is in the developement stage).
When we need to perform the developement we will be starting the cluster and perfom the changes.

This missing block is showing when we are starting the cluster after shutting down the same for a duration of 2 - 3 days

When you run "hadoop fsck -delete" you are telling the namenode to delete files whose blocks cannot be located. This is fine for temporary files. Before running it however you should run "hdfs fsck -list-corruptfileblocks", identify the reason why the blocks are missing. If the blocks are recoverable, you won't have to delete the files themselves.

The Hbase won't be starting without executing "hadoop fsck -delete" command and the "hdfs fsck -list-corruptfileblocks" out shows around 105 missing blocks. The missing block navigation (path to the block) showing the date stamp of the time of shutdown. Is that means we are not allowed to shutdown and start the cluster according to our requirement?

"could only be replicated to 0 nodes, instead of 1" could mean the datanodes are not healthy. Check the datanode logs under /var/log/hadoop-hdfs on both nodes to see what the problem might be. If it's not clear, paste the relevant parts to pastebin and give us the URL

This error is happening after running the command "hadoop fsck -delete" . After this command executaion the Hbase will be starting up and the HDFS will be showing the error "could only be replicated to 0 nodes, instead of 1"

Our ultimate goal is

create a cluster with 2 nodes
Shutdown cluster after completing my tasks
Start the cluster when ever we need to make changes or demo purpose.

Please let us know is the above scenario is possible or not in CDH 5.X.X.

Thanks in advance

Akash.

Trinity · ‎10-01-2014

Any suggestion to fix this issue.. 🙂

@Trinity wrote:
Hi Gautam,

Thanks for your quick response, please find my answers below.

"HDFS Under replicated blocks" implies that some blocks are not duplicated enough to satisfy the default replication factor of 3. If possible consider setting up clusters with at least 3 nodes.

As of now our requirement donot need to have 3 nodes.

"Missing Blocks" implies the datanodes which had block before shutdown now don't have it when they booted up. This could happen with the Instance Store. What kind of storage did you use on the nodes? This is explained here:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html

We are configured the entire environment in EBS volumes. Our working scenario is,
Cluster with 2 nodes
After making changes we need to shutdown the instances (since the application is in the developement stage).
When we need to perform the developement we will be starting the cluster and perfom the changes.
This missing block is showing when we are starting the cluster after shutting down the same for a duration of 2 - 3 days

When you run "hadoop fsck -delete" you are telling the namenode to delete files whose blocks cannot be located. This is fine for temporary files. Before running it however you should run "hdfs fsck -list-corruptfileblocks", identify the reason why the blocks are missing. If the blocks are recoverable, you won't have to delete the files themselves.

The Hbase won't be starting without executing "hadoop fsck -delete" command and the "hdfs fsck -list-corruptfileblocks" out shows around 105 missing blocks. The missing block navigation (path to the block) showing the date stamp of the time of shutdown. Is that means we are not allowed to shutdown and start the cluster according to our requirement?

"could only be replicated to 0 nodes, instead of 1" could mean the datanodes are not healthy. Check the datanode logs under /var/log/hadoop-hdfs on both nodes to see what the problem might be. If it's not clear, paste the relevant parts to pastebin and give us the URL

This error is happening after running the command "hadoop fsck -delete" . After this command executaion the Hbase will be starting up and the HDFS will be showing the error "could only be replicated to 0 nodes, instead of 1"

Our ultimate goal is
create a cluster with 2 nodes
Shutdown cluster after completing my tasks
Start the cluster when ever we need to make changes or demo purpose.
Please let us know is the above scenario is possible or not in CDH 5.X.X.

Thanks in advance
Akash.

Trinity · ‎10-08-2014