Reply
Highlighted
Explorer
Posts: 14
Registered: ‎03-14-2019

Datanode is failed to start after adding disk space

Hi All,

 

After disk space increased on one of the drive of a client node, its unable to start datanode. Even not connecting to the namenode. We increased space on /data/sbc1 .

 

We have started the node than run rebalance .But after a while we can see its again stopped.

 

Need your help.Below is the log.

 

Scheduling blk_1076562511_2824179 file /data/sdc1/dfs/dn/current/BP-926926485-10.25.176.190-1423244145752/current/finalized/subdir43/subdir10/blk_1076562511 for deletion

Expert Contributor
Posts: 104
Registered: ‎02-23-2018

Re: Datanode is failed to start after adding disk space

Hi @MantuDeka,

 

Can you show as the log of datanode service?

 

 

Regards,

Manu.

Explorer
Posts: 14
Registered: ‎03-14-2019

Re: Datanode is failed to start after adding disk space

Below is the log..We increased drive space /data/sdc1 to 100 GB . So after increasing the drive space node stop starting...

Mar 26, 7:35:40.174 AM
INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService
Scheduling blk_1076562510_2824178 file /data/sdb1/dfs/dn/current/BP-926926485-10.25.176.190-1423244145752/current/finalized/subdir43/subdir10/blk_1076562510 for deletion
Mar 26, 7:35:40.175 AM
INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService
Scheduling blk_1076562511_2824179 file /data/sdc1/dfs/dn/current/BP-926926485-10.25.176.190-1423244145752/current/finalized/subdir43/subdir10/blk_1076562511 for deletion
Mar 26, 7:35:40.174 AM
INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService
Deleted BP-926926485-10.25.176.190-1423244145752 blk_1076562512_2824180 file /data/sdb1/dfs/dn/current/BP-926926485-10.25.176.190-1423244145752/current/finalized/subdir43/subdir10/blk_1076562512
Mar 26, 7:35:40.175 AM
INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService
Deleted BP-926926485-10.25.176.190-1423244145752 blk_1076562510_2824178 file /data/sdb1/dfs/dn/current/BP-926926485-10.25.176.190-1423244145752/current/finalized/subdir43/subdir10/blk_1076562510
Mar 26, 7:35:40.176 AM
INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService
Deleted BP-926926485-10.25.176.190-1423244145752 blk_1076562511_2824179 file /data/sdc1/dfs/dn/current/BP-926926485-10.25.176.190-1423244145752/current/finalized/subdir43/subdir10/blk_1076562511
Expert Contributor
Posts: 104
Registered: ‎02-23-2018

Re: Datanode is failed to start after adding disk space

Hi @MantuDeka,

 

Try this command:

    hadoop dfsadmin -refreshNodes

 

Or try:

   Restart the nameNode

 

 

Regards,

Manu.

Explorer
Posts: 14
Registered: ‎03-14-2019

Re: Datanode is failed to start after adding disk space

[ Edited ]

Hi Manu,

Thanks for your quick response. I have restarted Namenode than started datanode, Still same.Even did rebalance .No luck. Look like the Datanode is cresh. Since we did not decommission it while upgrading drive space..

Expert Contributor
Posts: 104
Registered: ‎02-23-2018

Re: Datanode is failed to start after adding disk space

Hi @MantuDeka,

 

You are using CM to start the services? In this case, what ERROR is show when you click on start Service?

 

Can you revise the nameNode log? Thanks

 

 

Regards,

Manu

Explorer
Posts: 14
Registered: ‎03-14-2019

Re: Datanode is failed to start after adding disk space

Look like datanode is crashed, Can take backup of  data/sdc1/dfs/dn directory and than clean it on that particular Node, and try to start. Or decomission and recomission , or reconfigure it??

 

 

We have replication factor value 2.Does deletion of that particular directory on that particular node will impact on data lost ?

Expert Contributor
Posts: 104
Registered: ‎02-23-2018

Re: Datanode is failed to start after adding disk space

Hi @MantuDeka,

 

If you remove your dataNode location file, you will lose your data.

If you have replication factor 2, you can delete this node and reconfigure newly another one(in another location patch for example). But you need to know that the replication action in this case would be very slow. Be patient.

 

 

Regards,

Manu.

 

Announcements