Created on 05-15-2016 06:05 AM - edited 09-16-2022 03:19 AM
I have implemented 2 node cluster using Cloudera Manager 5.4.1 in VMWare workstation and this includes components like Hbase, Impala, Hive, Sqoop2, Oozie, Zookeeper, NameNode, SecondaryName and YARN.
I have simulated 3 disk drives per node which includes sda for OS , sdb & sdc for Hadoop storage.
As I had allocated sdb1 having 16GB and sdc1 having 16GB dedicated for Hadoop storage on each of the nodes. Hence I assume that my total capacity for HDFS storage including both nodes should be 64GB. But when checked the output using dfsadmin command and also using NameNode UI, I see that the "Configured Capacity is lesser than my original disk size allocated for HDFS".
I have shown the output of dfsadmin command below and also output of df -h is shown. Kindly help me understand why the Configured capacity is showing lesser than my original disk size ?
[hduser@node1 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_node1-LogVol00 40G 15G 23G 39% / tmpfs 3.9G 76K 3.9G 1% /dev/shm /dev/sda1 388M 39M 329M 11% /boot /dev/sdb1 16G 283M 15G 2% /disks/disk1/hdfsstorage/dfs /dev/sdc1 16G 428M 15G 3% /disks/disk2/hdfsstorage/dfs /dev/sdb2 8.1G 147M 7.9G 2% /disks/disk1/nonhdfsstorage /dev/sdc2 8.1G 147M 7.9G 2% /disks/disk2/nonhdfsstorage cm_processes 3.9G 5.8M 3.9G 1% /var/run/cloudera-scm-agent/process [hduser@node1 ~]$
[hduser@node1 zookeeper]$ sudo -u hdfs hdfs dfsadmin -report [sudo] password for hduser: Configured Capacity: 47518140008 (44.25 GB) Present Capacity: 47518140008 (44.25 GB) DFS Remaining: 46728742571 (43.52 GB) DFS Used: 789397437 (752.83 MB) DFS Used%: 1.66% Under replicated blocks: 385 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.52.111:50010 (node1.example.com) Hostname: node1.example.com Rack: /default Decommission Status : Normal Configured Capacity: 23759070004 (22.13 GB) DFS Used: 394702781 (376.42 MB) Non DFS Used: 0 (0 B) DFS Remaining: 23364367223 (21.76 GB) DFS Used%: 1.66% DFS Remaining%: 98.34% Configured Cache Capacity: 121634816 (116 MB) Cache Used: 0 (0 B) Cache Remaining: 121634816 (116 MB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 2 Last contact: Sun May 15 18:15:33 IST 2016 Name: 192.168.52.112:50010 (node2.example.com) Hostname: node2.example.com Rack: /default Decommission Status : Normal Configured Capacity: 23759070004 (22.13 GB) DFS Used: 394694656 (376.41 MB) Non DFS Used: 0 (0 B) DFS Remaining: 23364375348 (21.76 GB) DFS Used%: 1.66% DFS Remaining%: 98.34% Configured Cache Capacity: 523239424 (499 MB) Cache Used: 0 (0 B) Cache Remaining: 523239424 (499 MB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 2 Last contact: Sun May 15 18:15:32 IST 2016
Created 05-16-2016 03:48 AM
Created 05-15-2016 10:10 PM
As per your df -h output, the sdb2 and sdc2 has size of 8GB each not 16G:
/dev/sdb2 8.1G 147M 7.9G 2% /disks/disk1/nonhdfsstorage /dev/sdc2 8.1G 147M 7.9G 2% /disks/disk2/nonhdfsstorage
Created 05-16-2016 02:45 AM
Hi Vina,
As you can see from the output, sdb2 and sdc2 are allocated for nonhdfsstorgae (ex: intermediate data). sdb1 and sdc1 are the partition drives which are mounted for hdfs storage and they are of 16GB each as you can see in "df -h" output.
[hduser@node1 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sdb1 16G 283M 15G 2% /disks/disk1/hdfsstorage/dfs /dev/sdc1 16G 428M 15G 3% /disks/disk2/hdfsstorage/dfs
Can you please help.
Created 05-16-2016 03:48 AM
Created on 05-23-2016 07:45 AM - edited 05-23-2016 07:47 AM
Yes the link was helpful.
As per the property "dfs.datanode.du.reserved", it was configured to use 4.25 GB and hence I consider now that 4.25 GB is allocated for each data directory in a given node. Since I had two data directory partitions, the reserved space combined would be 8.5 GB per node and which brings the configured capacity on each node to be 23.5 GB (32GB - 8.5GB).
I arrived at the following formula === >
Configured Capacity = Total Disk Space allocated for Data Directories (dfs.data.dir) - Reserved Space for Non DFS Use (dfs.datanode.du.reserved)