Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Increase size of HDFS.

avatar
Explorer

I am running Hortonworks Sandbox HDP 2.6.5 on VirtualBox. I have increased the size of my virtual hard disk (.vdi) to 500 GB. However, when I login to Ambari and view the size of my disk, it shows 106 GB only. What should I do to increase the HDFS capacity from 106 GB to 500 GB?

1 ACCEPTED SOLUTION

avatar
Explorer

Since the solution is scattered across many posts, I'm posting a short summary of what I did.

I am running HDP 2.6.5 image on VirtualBox.

  1. Increased my virtual hard disk through Virtual Media Manager
  2. In the guest OS, 
    1. Partitioned the unused space
    2. Formatted the new partition as an ext4 file system
    3. Mounted the file system
    4. Update the /etc/fstab (I couldn't do it, as I did not find that file
  3. In Ambari, under DataNode directory config, added the newly mounted file system as a comma separated value
  4. Restarted HDFS (my cluster did not have any files, therefore I did not run the below)

Thanks to @Shelton for his guidance.

 

sudo -u hdfs hdfs balancer
​

 

 

 

View solution in original post

11 REPLIES 11

avatar
Master Mentor

@focal_fossa 

AFAIK  these sandboxes dynamically allocated storage.  You can try that by  generate and load data for TPC-DS

General usage is 

 

tpcds-setup.sh scale_factor [directory]

 

For example below will generate  200 GB of TPC-DS data in /user/data [HDFS]

 

./tpcds-setup.sh 200  /user/data

 

 

This should prove that the disk allocation is dynamic  see below links

 

https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build

 

Hope that helps

 

 

avatar
Explorer

I'll look into it. I'll have to install gcc and then later Maven to run those shell scripts. Thanks for your input.

avatar
Explorer

I don't think it is dynamically allocated, or at least it doesn't seem to be working. I've run out of space trying to load a ~70 GB file. How can I increase the capacity?Capture.PNGCapture1.PNG

avatar
Master Mentor

@focal_fossa 

Can you share how method you used to extend you VM disk? Whats the VM disk file extension vmdk or vdi? Note virtualbox does not allow resizing on vmdk images.

 

Does you disk show Dynamically allocated storage as shown below?

 

002.JPG

Please revert 

avatar
Explorer

It is a VDI. I have used Virtual Media Manager to increase the size of my disk. How can i get HDFS to expand and make use of the unallocated space?

 

I'm assuming this is how one would do it

1. Create a new partition in the Guest OS and assign a mount point to it.

2. Add that path to the DataNode directories

(or)

Extend the current partition to fill the unused disk space so that DataNode automatically increases the HDFS size?

 

wapo.PNGwapo 1.PNG

avatar
Explorer

@Shelton 

So, I've been able to create

  • a new partition and
  • format it as an ext4 filesystem
  • mount it

How do I add add this new partition to my datanode? Is it as simple as putting the drive path in Amabari DataNode config?

avatar
Master Mentor

@focal_fossa 

To increase the HDFS capacity add capacity by giving dfs.datanode.data.dir more mount points or directories the new disk need to be mounted/formatted prior to adding the mount point in Ambari.


In HDP using  Ambari, you should add the new mount point to the list of dirs in the dfs.datanote.data.dir property. Depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure

You will need to run the  HDFS balancer re-balances data across the DataNodes, moving blocks from overutilized to underutilized nodes
Running the balancer without parameters:

sudo -u hdfs hdfs balancer

Running the balancer with a default threshold of 10%, meaning that the script will ensure that disk usage on each DataNode differs from the overall usage in the cluster by no more than 10%.

You can use  a different threshold

sudo -u hdfs hdfs balancer -threshold 5

This specifies that each Datanode's disk usage must be (or will be adjusted to be) within 5% of the cluster's overall usage
This process can take long  depending on data in your cluster

 

Hope that helps

 

avatar
Explorer

Thank you for your inputs. I have been able to expand the size of my HDFS finally. 

avatar
Community Manager

Hi @focal_fossa,  I'm happy to see you resolved your issue. Can you please mark the appropriate reply as the solution? It will make it easier for others to find the answer in the future.

 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: