Support Questions

Find answers, ask questions, and share your expertise

Storage Upgrade

avatar
Explorer

Hi All

 

I have just enabled 4 node cluster with 1 OS and 1 Data drives at /disk1; CDH is intalled and working fine.

recently i added four extra storage drives on each of four nodes (1 name 3 data nodes). I have updated /etc/fstab  with drive identifiers and mount points.

 

while copying data to hdfs I noticed all data went to /disk1 only on each data nodes; hdfs is not recognizing/using recently added drives.

Currenlty CDH shows lots of health issues pointitng to HDFS storage.

 

please help me configuring hdfs so that it can use all drives on name node and datanodes

 

 

thanks

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Confirm if you have updated your DataNode data directories and "NameNode Data Directories" to use the new mounpoints. 

In Cloudera Manager Web UI> HDFS> Configuration> search for DataNode data directories and NameNode Data Directories

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

Confirm if you have updated your DataNode data directories and "NameNode Data Directories" to use the new mounpoints. 

In Cloudera Manager Web UI> HDFS> Configuration> search for DataNode data directories and NameNode Data Directories

avatar
Explorer

Yes I updated data node directories needed to stop cluster; redploy configuration and start it again.  I checked all recently added disks; they have /dfs/ folders and other hadoop related folders. 

so seems likde hdfs is good now.

I also needed to add these disks locations to Impala and Yarn; and followed same steps to restart cluster.

 

prior to adding these disks 2-6 in all nodes; disk1 was utilized 100% by hdfs,Yarn and Impala directories ; will they rebalance all data and distribute them on all nodes/ disks evenly ???

i can see that disk 1 utilization is reduced to 98% from 100%; after cluster reboot; still disk1 on all data nodes are 98% while other drives are ~1%

 

I wonder when I copy more data to hdfs, impala dameons /disk1 will hit 100% again and will cause healthe issues leading to service failures.

 

avatar
Explorer

thats what exactly happned; now that all disks are in hdfs ; we started to move some data to hdfs; what I am seeing now is disk1 on all nodes reached 100%.

 

CDH is again showing same health issues; 

currently, we are still in mid of transfering data to hdfs. should I wait till transfer is over and let CDH re-balance drives?

 

should i disable such disk usage checks from configuration tab?

 

would it affect performance of data processing across nodes?

 

shoukd we re install everything / a fresh start with having more storage drives to begin with?

 

thanks