About Justin_Watkins

VidyaSargur · ‎01-24-2023

@bvishal, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.

ravi1 · ‎05-26-2016

You can get a sandbox from http://hortonworks.com/downloads/#sandbox But you will need at least 8GB for the sandbox, so make sure you are on a machine that has 12-16GB RAM if you get that. If you don't have a machine with that amount of RAM, Azure/AWS is your option. Any further questions, please open a new thread for each question, so it won't be a long thread of question and answers.

Justin_Watkins · ‎05-06-2016

There are a number of things that cause HDFS imbalance. This post explains some of those causes in more detail. The balancer should be run regularly in a production system (you can kick it off from the command line, so you can schedule it using cron, for example). The balancer can take a while to complete if there are a lot of blocks to move. Note that, when HDFS moves a block, the old block gets "marked for deletion" but doesn't get deleted immediately. HDFS deals with these un-used blocks over time.

Justin_Watkins · ‎05-04-2016

Thanks @Pardeep. This looks like it will help.

LH · ‎01-03-2019

Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them. We had a system with 2 namenodes with high availability enabled. For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855). There was one folder containing the IP of namenode1 and another containing the IP of namenode 2. All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder. I don't know how we got to the point of having 2 block pools folders, but we did. In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1. We then restarted HDFS and made sure all the blocks are reported and there's no more missing blocks.

Online	Offline
Last Visited	‎09-21-2016 09:44 AM

Member Since	‎05-03-2016 03:55 PM
Last Visited	‎09-21-2016 09:44 AM
Posts	24
Kudos received	33

Cloudera Community

Re: Will i be charged to my credit card for choosi...

Re: I have to practise hadoop(hive,pig,sqoop,oozie...

Re: HDFS File Placement when File Size Exceeds Blo...

Re: namenode format effect

Re: Will i be charged to my credit card for choosi...

Re: Uneven DFS data storage across cluster.

Re: How are Ambari Enhanced Config / Guided Config...

Re: Best way of handling corrupt or missing blocks...