Reply
Contributor
Posts: 29
Registered: ‎02-11-2014

Duplicated blocks get orphaned

This is a follow-up post on a previous topic about disk space not being freed up when deleting files from HDFS: http://community.cloudera.com/t5/Cloudera-Manager-Installation/Deleting-files-does-not-clear-disk-sp...

After finding out more about how this behaviour arises, it felt motivated to create a new thread in the HDFS section instead.

 

It seems like files that were added to HDFS prior to upgrading from the CDH 5 beta2 to the current version have their blocks duplicated. Running fsck /[path-to-arbitrary-old-file] -files -blocks -locations shows the file as if it were replicated by a factor 3. When searching for one of the file blocks in the local file system on one of the data nodes that hold it, there are two hits for each block replica; one under dfs/dn/current/BP-.../current/finalized/ and another one under dfs/dn/.../previous/finalized.

 

Now, when deleting the file from HDFS, only the block copies on the current path are actually deleted. The other ones are left on the data nodes, seeminlgy orphaned from the name node meta data. The hdfs log file only mentions deleting the current one and contains no error messages.

 

Files that were added after the HDFS upgrade do not behave the same way, so I suspect this has something to with the finalization of the metadata upgrade not working (or having been performed) properly.

 

As my previous post mentions, we are soon running out of disk space on the cluster, and would like to resolve this as soon as possible. Would it be safe to just manually remove all files under the previous paths from the local file systems of all data nodes? Or could the name node still be holding some reference to them that would become corrupt? Are there other ways to go about solving this?

 

\Knut

Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Duplicated blocks get orphaned

This is the behavior of an upgrade that has not been finalized. Can you confirm from your NN UI that the upgraded image has been finalized?

The files are hardlinks and the DN deletes them once the finalize command is run.
Contributor
Posts: 29
Registered: ‎02-11-2014

Re: Duplicated blocks get orphaned

Thanks for the reply!

 

The finalize command was alreade run after the update. I also tried rerunning it after identifying the cause of this issue. I think the problem here is that the NN for some reason did not clean up the finalized folder after itself. We have now deleted all of the orphanded blocks manually, and things have been working well since then.

 

\Knut

Highlighted
New Contributor
Posts: 2
Registered: ‎08-03-2016

Re: Duplicated blocks get orphaned

We are also facing the same issue. Can you please tell how to manually delete the files

]