Created on 07-20-2020 01:05 PM - edited 07-20-2020 08:27 PM
I am running Hortonworks Sandbox HDP 2.6.5 on VirtualBox. I have increased the size of my virtual hard disk (.vdi) to 500 GB. However, when I login to Ambari and view the size of my disk, it shows 106 GB only. What should I do to increase the HDFS capacity from 106 GB to 500 GB?
Created on 07-23-2020 06:18 AM - edited 07-23-2020 06:19 AM
Since the solution is scattered across many posts, I'm posting a short summary of what I did.
I am running HDP 2.6.5 image on VirtualBox.
Thanks to @Shelton for his guidance.
sudo -u hdfs hdfs balancer
Created 07-20-2020 02:39 PM
AFAIK these sandboxes dynamically allocated storage. You can try that by generate and load data for TPC-DS
General usage is
tpcds-setup.sh scale_factor [directory]
For example below will generate 200 GB of TPC-DS data in /user/data [HDFS]
./tpcds-setup.sh 200 /user/data
This should prove that the disk allocation is dynamic see below links
https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build
Hope that helps
Created 07-20-2020 04:04 PM
I'll look into it. I'll have to install gcc and then later Maven to run those shell scripts. Thanks for your input.
Created 07-20-2020 08:15 PM
I don't think it is dynamically allocated, or at least it doesn't seem to be working. I've run out of space trying to load a ~70 GB file. How can I increase the capacity?
Created 07-21-2020 01:56 AM
Can you share how method you used to extend you VM disk? Whats the VM disk file extension vmdk or vdi? Note virtualbox does not allow resizing on vmdk images.
Does you disk show Dynamically allocated storage as shown below?
Please revert
Created on 07-21-2020 06:42 AM - edited 07-21-2020 06:51 AM
It is a VDI. I have used Virtual Media Manager to increase the size of my disk. How can i get HDFS to expand and make use of the unallocated space?
I'm assuming this is how one would do it
1. Create a new partition in the Guest OS and assign a mount point to it.
2. Add that path to the DataNode directories
(or)
Extend the current partition to fill the unused disk space so that DataNode automatically increases the HDFS size?
Created 07-22-2020 09:06 AM
So, I've been able to create
How do I add add this new partition to my datanode? Is it as simple as putting the drive path in Amabari DataNode config?
Created 07-22-2020 09:37 AM
To increase the HDFS capacity add capacity by giving dfs.datanode.data.dir more mount points or directories the new disk need to be mounted/formatted prior to adding the mount point in Ambari.
In HDP using Ambari, you should add the new mount point to the list of dirs in the dfs.datanote.data.dir property. Depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure
You will need to run the HDFS balancer re-balances data across the DataNodes, moving blocks from overutilized to underutilized nodes
Running the balancer without parameters:
sudo -u hdfs hdfs balancer
Running the balancer with a default threshold of 10%, meaning that the script will ensure that disk usage on each DataNode differs from the overall usage in the cluster by no more than 10%.
You can use a different threshold
sudo -u hdfs hdfs balancer -threshold 5
This specifies that each Datanode's disk usage must be (or will be adjusted to be) within 5% of the cluster's overall usage
This process can take long depending on data in your cluster
Hope that helps
Created 07-22-2020 07:48 PM
Thank you for your inputs. I have been able to expand the size of my HDFS finally.
Created 07-23-2020 12:43 AM
Hi @focal_fossa, I'm happy to see you resolved your issue. Can you please mark the appropriate reply as the solution? It will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,