Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to increase the capacity of HDFS?

Solved Go to solution

Re: How to increase the capacity of HDFS?

Mentor

Excellent, glad it worked

Re: How to increase the capacity of HDFS?

Re: How to increase the capacity of HDFS?

Expert Contributor

@Neeraj Sabharwal I deleted my previous comment as it didn't make any sense. What I currently don't understand is that, the "DataNode directories" show /hadoop/hdfs/data. I am not able to change this. If I edit the field to remove this folder name, the "save" button gets disabled. It is not taking /home folder as a valid folder. The /home mount has the maximum space, and I am not able to mention this in the "DataNode directories" field. Any ideas?. Thanks.

Highlighted

Re: How to increase the capacity of HDFS?

Expert Contributor

I am posting this so that it will be helpful for those users who are looking towards understanding how DFS capacity could be increased. I am providing the details in steps below.

1) The section "HDFS Disk usage" (a box) on the dashboard, shows the current DFS usage. However, the total DFS capacity is not shown here.

2) To view the total capacity use Name Node Web UI eg. (http://172.26.180.6:50070/). This will show you the total DFS capacity.

3) It is helpful to see the file system information by executing "df -h", which tells you the size of the file system. In my case the root file system had very less space allocated (50 GB) to it as compared to file system mounted on /home (750 GB).

4) The straight forward way to increase the DFS capacity is mention additional folder in the "DataNode directories" field under HDFS -> Configs -> Settings tab, as a comma separated value. This new folder should exist in a file system that has more disk capacity.

5) Ambari for some reason does not accept /home as the folder name for storing file blocks. By default it shows "/hadoop/hdfs/data. You cannot delete it completely to replace it with new folder path.

6) The best way is to create a new mount point and point it to a folder in the /home. Therefore create a mount point eg. Hdfsdata and then point it to a folder under home, eg. /home/hdfsdata. Following are the steps to create a new mount point:

  1. Create a folder in the root eg. /hdfsdata Create a folder under home eg. /home/hdfsdata
  2. Provide permission to 'hdfs' user to this folder: chown hdfs:hadoop -R /home/hdfsdata
  3. Provide file/folder permissions to this folder: chmod 777 -R /home/hdfsdata.
  4. Mount this new folder mount --bind /home/hdfsdata/ /hdfsdata/

After the above steps, restart the HDFS service and you have your capacity increased.

Re: How to increase the capacity of HDFS?

New Contributor

I have done all the steps you have given above and I am facing an issue right now while restarting the HDFS service. Here is the log attached below.

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 167, in <module>
    DataNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 530, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 62, in start
    datanode(action="start")
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_datanode.py", line 72, in datanode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 267, in service
    Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-hadoop1ind1.india.out
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000bc800000, 864026624, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 864026624 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /var/log/hadoop/hdfs/hs_err_pid51884.log

Can you please look out at and tell me what exactly went wrong ?

Re: How to increase the capacity of HDFS?

New Contributor

Previous comment was for namenode restart and this is I am showing you the datanode restart after allocating more memory to java.

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 167, in <module>
    DataNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 530, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 62, in start
    datanode(action="start")
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_datanode.py", line 72, in datanode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 267, in service
    Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-hadoop1ind1.india.out

@Pradeep kumar : Can you please have a look at both the log and help me out extending my current HDFS storage

Re: How to increase the capacity of HDFS?

New Contributor

I had the same alert. The capacity of the DN(datanode) was somehow assigned to the very small space. After reading the threads here, I was about to create a partition and mount it on a new DN directory. However, since I did not have any issue to use the /hadoop/hdfs/data which is in /(root) directory, I tried to find other way around and found that the amount of "Reserved space for HDFS" under the advanced tab was huge almost take whole unused space of the root directory. After downsize of "Reserved space for HDFS", every alert was solved.

Re: How to increase the capacity of HDFS?

Expert Contributor

That is a good point David Hwang. Thanks for sharing :)

Don't have an account?
Coming from Hortonworks? Activate your account here