Created on 11-05-2015 10:08 AM - edited 09-16-2022 02:48 AM
Hi All
I have just enabled 4 node cluster with 1 OS and 1 Data drives at /disk1; CDH is intalled and working fine.
recently i added four extra storage drives on each of four nodes (1 name 3 data nodes). I have updated /etc/fstab with drive identifiers and mount points.
while copying data to hdfs I noticed all data went to /disk1 only on each data nodes; hdfs is not recognizing/using recently added drives.
Currenlty CDH shows lots of health issues pointitng to HDFS storage.
please help me configuring hdfs so that it can use all drives on name node and datanodes
thanks
Created 11-05-2015 01:33 PM
Confirm if you have updated your DataNode data directories and "NameNode Data Directories" to use the new mounpoints.
In Cloudera Manager Web UI> HDFS> Configuration> search for DataNode data directories and NameNode Data Directories
Created 11-05-2015 01:33 PM
Confirm if you have updated your DataNode data directories and "NameNode Data Directories" to use the new mounpoints.
In Cloudera Manager Web UI> HDFS> Configuration> search for DataNode data directories and NameNode Data Directories
Created 11-05-2015 09:48 PM
Yes I updated data node directories needed to stop cluster; redploy configuration and start it again. I checked all recently added disks; they have /dfs/ folders and other hadoop related folders.
so seems likde hdfs is good now.
I also needed to add these disks locations to Impala and Yarn; and followed same steps to restart cluster.
prior to adding these disks 2-6 in all nodes; disk1 was utilized 100% by hdfs,Yarn and Impala directories ; will they rebalance all data and distribute them on all nodes/ disks evenly ???
i can see that disk 1 utilization is reduced to 98% from 100%; after cluster reboot; still disk1 on all data nodes are 98% while other drives are ~1%
I wonder when I copy more data to hdfs, impala dameons /disk1 will hit 100% again and will cause healthe issues leading to service failures.
Created 11-06-2015 08:18 AM
thats what exactly happned; now that all disks are in hdfs ; we started to move some data to hdfs; what I am seeing now is disk1 on all nodes reached 100%.
CDH is again showing same health issues;
currently, we are still in mid of transfering data to hdfs. should I wait till transfer is over and let CDH re-balance drives?
should i disable such disk usage checks from configuration tab?
would it affect performance of data processing across nodes?
shoukd we re install everything / a fresh start with having more storage drives to begin with?
thanks