About ANKITINTEL

ANKITINTEL · ‎04-18-2016

Some more questions based on this thread Once storage configuration is defined and SSDs/ Disks are identified by HDFS, does all drives (SSDs+ DIsks) are used and single virtual storage ? if yes does it mean while running jobs/queries some data blocks would be fetched from Disks while others from SSDs? or two different virtual storage hot and cold?? If Yes, while copying/generating data in HDFS, will there be 3 copies of data across disks+storage or 3 copies in Disks and 3 copies in SSDs ; total 6 copies? how do I force data to be used from SSDs only or DISKs only; while submitting any Jobs/queries using various tools(hive, Impala, spark etc)

ANKITINTEL · ‎11-10-2015

following steps worked for me in my case the node (#4) had disk failures was not coming up as data node ; was stopped i tried starting it thougth services>hdfs>node4>start this data node but failed while looking at logs o found warnings about more than one drives having data inconcistency; theere were FATAL messages but those lead me to dead end while searching for solution. i went to all those mounted points(disks) and removed dfs by rm -rf /dfs on node; one of drive was unable to mount beacuse on that local location hdfs already made those folders; due to pervious attempts of starting datanode; i removed those folders as well and was able to mount. once mounted the new disk was clean free of any hadoop folders I found a file in /tmp/hsprfdata_hdfs which began with some numbers (was some kind of log); i renamed it as old_**** just in case I need it later; and let hdfs know that file is gone!!! i went back to CDH manager HDFS node 4 and tried restarting this datanode; it worked; all the disks have dfs and directories now; tmp location had that old_*** file removed and there is new file (some other number)

ANKITINTEL · ‎11-09-2015

HI my four node (1 Name , 3 Data node) cluster just recoverd from Hardrvie failure; I hade to re-formate three of data node drives ( all on node 4)and re mount it to designated location. CDH manager shows following health issues and Impala , Hbase services are stopped. hard drive failure was in node4, node 1 is master. HDFS shows underreplicated blocks (total 1 block is missing and around 99% are under replicated); I wonder how can I solve these issues. From my opinion hdfs should re-generate undereplicated blocks itself; if yes how long should it take? is there a way to experdite this process (make it faster)? if No how can I manually re-generate such blocks? thanks

ANKITINTEL · ‎11-06-2015

thats what exactly happned; now that all disks are in hdfs ; we started to move some data to hdfs; what I am seeing now is disk1 on all nodes reached 100%. CDH is again showing same health issues; currently, we are still in mid of transfering data to hdfs. should I wait till transfer is over and let CDH re-balance drives? should i disable such disk usage checks from configuration tab? would it affect performance of data processing across nodes? shoukd we re install everything / a fresh start with having more storage drives to begin with? thanks

ANKITINTEL · ‎11-05-2015

Yes I updated data node directories needed to stop cluster; redploy configuration and start it again. I checked all recently added disks; they have /dfs/ folders and other hadoop related folders. so seems likde hdfs is good now. I also needed to add these disks locations to Impala and Yarn; and followed same steps to restart cluster. prior to adding these disks 2-6 in all nodes; disk1 was utilized 100% by hdfs,Yarn and Impala directories ; will they rebalance all data and distribute them on all nodes/ disks evenly ??? i can see that disk 1 utilization is reduced to 98% from 100%; after cluster reboot; still disk1 on all data nodes are 98% while other drives are ~1% I wonder when I copy more data to hdfs, impala dameons /disk1 will hit 100% again and will cause healthe issues leading to service failures.

ANKITINTEL · ‎11-05-2015

Hi All I have just enabled 4 node cluster with 1 OS and 1 Data drives at /disk1; CDH is intalled and working fine. recently i added four extra storage drives on each of four nodes (1 name 3 data nodes). I have updated /etc/fstab with drive identifiers and mount points. while copying data to hdfs I noticed all data went to /disk1 only on each data nodes; hdfs is not recognizing/using recently added drives. Currenlty CDH shows lots of health issues pointitng to HDFS storage. please help me configuring hdfs so that it can use all drives on name node and datanodes thanks

Online	Offline
Last Visited	‎07-14-2016 02:26 PM

Member Since	‎11-05-2015 10:00 AM
Last Visited	‎07-14-2016 02:26 PM
Posts	10

Cloudera Community

Re: HDFS - Under-Replicated Blocks, missing Blocks

Re: How to define HDFS storage tiers and storage p...

Re: HDFS - Under-Replicated Blocks, missing Blocks

HDFS - Under-Replicated Blocks, missing Blocks

Re: Storage Upgrade

Re: Storage Upgrade

Storage Upgrade