Support Questions

sim6 · ‎04-29-2016

I am using Ambari and it shows that my data node capacity is only

991.83 MB

and has 283 blocks. (Surprisingly), Even if it is the default, why is it as low as 991 MB?

I hear that having too many blocks isn't such a good idea. I do not really have space constraints on the machine I am on and we are not planning to have datanode distributed across multiple hosts.

My question is:

1. Is there a maximum limit to size of a datanode? If yes, what is it?

2. What is the easiest and robust way to have multiple datanodes on the same machine without breaking what is up and running in the existing cluster?

3. I understand that we need to add more directories for new data nodes and specify the path in ambari but what next?

4. what is the optimum block size in ambari? (or if there is some datanode/block size ration for the optimized number?)

5. How to configure the block size through ambari?

6. How to increase size of an existing datanode in ambari?

pminovic · ‎04-29-2016

HI @simran kaur, to answer your questions

There is no limit to "size", or capacity of the DN. It's only bound by the number of hard disk slots and capacity of your individual disks. If you have 12 slots and 6T per disk, then it's 72T per node.
Datanode is a process managing HDFS files on a machine. You use only 1 DN on the same machine.
You specify your DN directories, typically mounting points of your disks in dfs.datanode.data.dir. That's all, HDFS will take care of organizing data there.
You configure block size as the dfs.blocksize property in HDFS. The default is 134217728 or 128M.
The default of 128M is considered an optimal size for general-purpose clusters. If you keep many large files it can be increased, for example to 256M.
And finally your DN capacity of only 991M indicates that something is wrong or you are running a Sandbox on a machine with small capacity. My capacity on my Sandbox is 45G.

View solution in original post

pminovic · ‎04-29-2016

HI @simran kaur, to answer your questions

There is no limit to "size", or capacity of the DN. It's only bound by the number of hard disk slots and capacity of your individual disks. If you have 12 slots and 6T per disk, then it's 72T per node.
Datanode is a process managing HDFS files on a machine. You use only 1 DN on the same machine.
You specify your DN directories, typically mounting points of your disks in dfs.datanode.data.dir. That's all, HDFS will take care of organizing data there.
You configure block size as the dfs.blocksize property in HDFS. The default is 134217728 or 128M.
The default of 128M is considered an optimal size for general-purpose clusters. If you keep many large files it can be increased, for example to 256M.
And finally your DN capacity of only 991M indicates that something is wrong or you are running a Sandbox on a machine with small capacity. My capacity on my Sandbox is 45G.

sim6 · ‎04-29-2016

Thank you for your response 🙂 That helped. No, I am not running through a sandbox and have installed hdp on a centos machine. Could you please tell what could be the possible reasons for DN capacity to be so low?

pminovic · ‎04-29-2016

Can you check your dfs.datanode.data.dir setting, and confirm that the directories listed there correspond to your disk mounting points. The setting applies to all Data nodes in the cluster, all of them must have the same disk mounting configuration.

nagaiik · ‎10-03-2016

do the following to increase the dfs size :

Create multiple directories or mount points in the hdfs data path :

by default ambari deployed cluster contain /hadoop/hdfs/data as the data directory , so

with root privileges :

create a directory

1) mkdir /hadoop/hdfs/data1

2) chown -R hdfs:hadoop /hadoop/hdfs/data1

3) chmod -R 777 /hadoop/hdfs/data1

now edit the hdfs configuration :

1) on the cluster click on hdfs , click on configs , in the settings add the directory separated by comma under the hdfs.data.dir property :

ex : /hadoop/hdfs/data, /hadoop/hdfs/data1

save the changes and restart the effected

That will increase the disk space , to increase further repeat the same (or)

lvs resize /hadoop/hdfs/data directory

,

do the following to increase the dfs size :

Create multiple directories or mount points in the hdfs data path :

by default ambari deployed cluster contain /hadoop/hdfs/data as the data directory , so

with root privileges :

create a directory

1) mkdir /hadoop/hdfs/data1

2) chown -R hdfs:hadoop /hadoop/hdfs/data1

3) chmod -R 777 /hadoop/hdfs/data1

now edit the hdfs configuration :

1) on the cluster click on hdfs , click on configs , in the settings add the directory separated by comma under the hdfs.data.dir property :

ex : /hadoop/hdfs/data, /hadoop/hdfs/data1

save the changes and restart the effected

That will increase the disk space , to increase further repeat the same (or)

lvs resize /hadoop/hdfs/data directory

Cloudera Community

Support Questions

How to increase datanode capacity ?

How to increase the capacity of HDFS?

Datanode Service Error Related to NFS Mount Issue

Increase HDFS capacity with additional disks

Garbage Collection Pauses in Namenode and Datanode

HDFS capacity is 0 but all DataNode are live

HDFS Forecast & Capacity Planning

Configuring YARN Capacity Scheduler with Ambari

Datanode Balancer bandwidth configuration

Control User Access to Capacity Scheduler Queues.

Is it best to increase the disk or datanode to add...