Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hadoop usage keeps growing on a very old version of Hadoop.

avatar
New Contributor

First don't laugh or tell me to upgrade, I inherited this and we are unable to upgrade it before it gets decommissioned in the next couple years. 

Hadoop Version: 0.20.2, r911707

We just moved from one datacenter to another by creating a new hadoop cluster (from old servers) and moving data. The data is transient and only stays in hadoop for 3 months. The old cluster hovered around 500TB for a long time. The new cluster has grown up to now 940TB. I have double checked the data removal script and it is working. The data is no longer available and the logs "say" the data gets deleted but we are steadily growing about 2TB/day whis is about how much new data we get per day. looking at the logs on a datanode the actual OS file on the datanode that says gets deleted does in fact get deleted. looking at the OS filesystems they are almost full where on the old cluster they were about 50% used. Hopefully that covers the background. The only thing I can find is when I run "hadoop dfsadmin -metasave <filename>" inside it there is a line stating there are 275M blocks waiting for deletion.

 

"Blocks 275521378 waiting deletion from 0 datanodes."

 

This number is steadily growing, if ran repeatedly it will show that there are some datanodes that have those blocks but normally it says 0 datanodes. 

 

I've ran out of options and would appreciate some help.

oh, and when deleting data it does not utilize /trash. Normally, the script that cleans data uses -skipTrash but during testing I noticed /trash was not getting utilized.

 

 

 

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Found out the Blocks waiting for deletion was inaccurate. Later versions had a bug fix for it.

In the end we found some of our processing had changed with the move and we were keeping more data.

I don't see a way to mark this as complete, done, or whatever but consider it closed. 

View solution in original post

1 REPLY 1

avatar
New Contributor

Found out the Blocks waiting for deletion was inaccurate. Later versions had a bug fix for it.

In the end we found some of our processing had changed with the move and we were keeping more data.

I don't see a way to mark this as complete, done, or whatever but consider it closed.