Created on 10-26-2019 12:33 PM - last edited on 10-26-2019 01:20 PM by ask_bill_brooks
We installed new ambari cluster with the following details ( we moved to redhat 7.5 instead 7.2 )
Redhat – 7.5
HDP version – 2.6.4
Ambari – 2.6.2
After we complete the installation , we notice about very strange behavior ( please note that this is new cluster )
On HDFS status summary, I see the following messages about under-replicated blocks
We see under replicated blocks is 12 ( while its should be 0 on new installation )
Any suggestion – why this ?
I just want to say that this behavior not appears on redhat 7.2
Created 10-27-2019 03:40 AM
Surely you can use that hdfs fsck / -delete but remember it will be put in the trash !!!
Created 10-26-2019 03:22 PM
Under replicated blocks
There are a couple of potential source of the problem that triggers this alert! The HDP versions earlier than HDP 3.x all use the standard default 3 replication factor for reasons you know well , the ability to rebuild the data in whatever case as opposed to the new Erasure coding policies in Hadoop 3.0.
Secondly, the cluster will rebalance itself if you gave it time 🙂
Having said that the first question is how many data nodes were set up in this new cluster and did you enable rack awareness?
This usually means that some files are “asking” for a specific number of target replicas that are not present or not being able to get the replica. So the question is, how i know which files are asking for a number of replicas that are not available?
The first option is use hdfs fsck:
$ hdfs fsck / -storagepolicies
****** **************output *********************
Connecting to namenode via http://xxx.com:50070/fsck?ugi=hdfs&storagepolicies=1&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /192.168.0.94 for path / at Sat Oct 26 23:03:24 CEST 2019
/user/zeppelin/notebook/2EC24FF9U/note.json:
Under replicated BP-2067995211-192.168.0.101-1537740712051:blk_1073751507_10767.
Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
******
Change the replication
$ hdfs dfs -setrep -w 1 /user/zeppelin/notebook/2EC24FF9U/note.json
Replication 1 set: /user/zeppelin/notebook/2EC24FF9U/note.json
Waiting for /user/zeppelin/notebook/2EC24FF9U/note.json ... done
You also need to check dfs.replication in hdfs-site.xml the default is configured to be 3. Note that it turns out that if you upload files through Ambari, the file actually gets the replication factor of 3.
HTH
Created 10-26-2019 11:27 PM
Dear Shelton
this are the results that we get from
hdfs fsck / -storagepolicies
FSCK started by hdfs (auth:SIMPLE) from /192.9.200.217 for path / at Sun Oct 27 05:49:31 UTC 2019
..................
/hdp/apps/2.6.4.0-91/hive/hive.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741831
/hdp/apps/2.6.4.0-91/hive/hive.tar.gz: MISSING 1 blocks of total size 106475099 B..
/hdp/apps/2.6.4.0-91/mapreduce/hadoop-streaming.jar: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741834
/hdp/apps/2.6.4.0-91/mapreduce/hadoop-streaming.jar: MISSING 1 blocks of total size 105758 B..
/hdp/apps/2.6.4.0-91/mapreduce/mapreduce.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741825
/hdp/apps/2.6.4.0-91/mapreduce/mapreduce.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741826
/hdp/apps/2.6.4.0-91/mapreduce/mapreduce.tar.gz: MISSING 2 blocks of total size 212360343 B..
/hdp/apps/2.6.4.0-91/pig/pig.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741829
/hdp/apps/2.6.4.0-91/pig/pig.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741830
/hdp/apps/2.6.4.0-91/pig/pig.tar.gz: MISSING 2 blocks of total size 135018554 B..
/hdp/apps/2.6.4.0-91/slider/slider.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741828
/hdp/apps/2.6.4.0-91/slider/slider.tar.gz: MISSING 1 blocks of total size 47696340 B..
/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741832
/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741833
/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz: MISSING 2 blocks of total size 189992674 B..
/hdp/apps/2.6.4.0-91/tez/tez.tar.gz: CORRUPT blockpool BP-2095386762-192.9.201.8-1571956239762 block blk_1073741827
/hdp/apps/2.6.4.0-91/tez/tez.tar.gz: MISSING 1 blocks of total size 53236968 B......
/user/ambari-qa/.staging/job_1571958926657_0001/job.jar: Under replicated BP-2095386762-192.9.201.8-1571956239762:blk_1073741864_1131. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/ambari-qa/.staging/job_1571958926657_0001/job.split: Under replicated BP-2095386762-192.9.201.8-1571956239762:blk_1073741865_1132. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
...Status: CORRUPT
yes we check the replication factor - yes its 3
based on that results , can we just delete the corrupted blocks?
Created 10-27-2019 01:36 AM
Regarding under replicated blocks, HDFS is supposed to recover them automatically (by creating missing copies to fulfill the replication factor) but in your case, your cluster-wide replication factor is 3 but the target is 10 It's suggesting have 5 data nodes while there are 10 replicas leading to the under replication alert!
According to the output you have 2 distinct problems
(a) Under replicated blocks, Target Replicas is 10 but found 5 live replica(s) [Last 2 lines]
(b) Corrupt blocks with 2 different solutions
Solution 1 under replicated
You could force the 2 blk to align with cluster-wide replication factor by adjusting using -setrep
$ hdfs dfs -setrep -w 3 [File_name]
Validate by
Now you should see 3 after the file permissions before the user:group like below
$ hdfs dfs -ls [File_name]
-rw-r--r-- 3 analyst hdfs 1068028 2019-10-27 12:30 /flighdata/airports.dat
And wait for the deletion to happen or run the below snippets sequentially
$ hdfs fsck / | grep 'Under replicated'
$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files
$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
For Corrupt files
$ hdfs fsck / | egrep -v '^\.+$' | grep -i corrupt
...............Example output............................
/user/analyst/test9: CORRUPT blockpool BP-762603225-192.168.1.2-1480061879099 block blk_1055741378
/user/analyst/data1: CORRUPT blockpool BP-762603225-192.168.1.2-1480061879099 block blk_1056741378
/user/analyst/data2: MISSING 3 blocks of total size 338192920 B.Status: CORRUPT
CORRUPT FILES: 9
CORRUPT BLOCKS: 18
Corrupt blocks: 18
The filesystem under path '/' is CORRUPT
Locate corrupted block
$ hdfs fsck / | egrep -v '^\.+$' | grep -i "corrupt blockpool"| awk '{print $1}' |sort |uniq |sed -e 's/://g' >corrupted.flst
Get the location in the above output corrupted.flst
$ hdfs fsck /user/analyst/xxxx -locations -blocks -files
Remove the corrupted files
hdfs dfs -rm /path/to/corrupted.flst
Skip the trash to permanently delete
$ hdfs dfs -rm -skipTrash /path/to/corrupt_filename.
You should give the cluster sometime to rebalance in the case of under-replicated files.
Created 10-27-2019 02:28 AM
about the corrupted file
why just not use the following?
hdfs fsck / -delete
Created 10-27-2019 03:40 AM
Surely you can use that hdfs fsck / -delete but remember it will be put in the trash !!!
Created 10-27-2019 04:02 AM
may I return to my first question
until using redhat 7.2 , every thing was ok , after each scratch installation we never seen that
but when we jump to redhat 7.5
then every cluster that created was with corrupted files - any HINT - why ?