Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to increase the capacity of HDFS?

avatar
Expert Contributor

I am posting this answer after searching in the internet for a good explanation. Currently the total physical hard disk space (4 nodes) is 720 GB. The dashboard currently shows that only 119 GB is configured for DFS. I want to increase this space to at last 300 GB. I didn't find anything staright forward on Ambari dashboard to do this. The only information I found on the internet is to modifify core-site.xml file to hav a property hadoop.tmp.dir pr that points to another directory. I do not want to blankly do it, without understanding what it means to be expanding HDFS capacity and how to do it through Ambari Dashboard.

1 ACCEPTED SOLUTION

avatar
Master Mentor

You add capacity by giving dfs.datanode.data.dir more mount points or directories. In Ambari that section of configs is I believe to the right depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure

View solution in original post

17 REPLIES 17

avatar
Master Mentor

You add capacity by giving dfs.datanode.data.dir more mount points or directories. In Ambari that section of configs is I believe to the right depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure

avatar
Expert Contributor

@Artem Ervits Can you please elaborate what you what you mean by "right spending of version of ambari". I checked "Advanced hdfs-site" section, but I dont see any "dfs.datanode.data.dir"

avatar
Master Mentor

sorry auto-correct on my tablet. @Pradeep kumar I updated the answer with correct spelling.

avatar
Expert Contributor

@Artem Ervits. Thanks, but I still could not find this property under "Advnce hdfs-site" section. I was reading the link provided by Neeraj Sabharwal, in his answer below, which also talks about mentioning /hadoop as the folder in the property 'dfs.datanode.data.dir'. But, like I said, I could not find this property.

avatar
Expert Contributor

@Artem Ervits I found "Data Node Directories" under "Data Node" section under "Settings" tab. The "Data Node Directories" has the folder name /hadoop/hdfs/data. However, when I do df -h, I do not see this folder in the mount information. Following is the output of my the df -h on the master server:

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/vg_item70288-lv_root 50G 41G 6.2G 87% /

tmpfs 3.8G 0 3.8G 0% /dev/shm

/dev/sda1 477M 67M 385M 15% /boot

/dev/mapper/vg_item70288-lv_home 172G 21G 143G 13% /home

avatar
Master Mentor

That's the problem, you need to replace that path with correct path that does exist, otherwise data is being written to filesystem and run out of space quickly

avatar
Expert Contributor

@Artem Ervits: I am having many issues now. 1) Ambari doesn't allow me to remove folder name "/hadoop/hdfs/data". So I cannot completely replace it with a new folder. 2) If I give /hadoop/hdfs/data,/home then it shows me error Can't start with "home(s)". I am pretty sure something is wrong.

avatar
Master Mentor

Create mount point /hadoop/ pointing to your large disk

avatar
Expert Contributor

@Artem Ervits: Okay. I have finally got what I wanted and I have increased the DFS capacity. Thanks for your help. I learned a lot through this exercise :). I am accepting your answer and also providing steps that I followed in another answer post, so that it will be helpful to other users.