Created 09-04-2018 05:33 PM
hi all,
we installed new hadoop cluster ( ambari + HDP version 2.6.4 )
after installation , we notice that we have problem with the spark-submit
and finally we found that spark2-hdp-yarn-archive.tar.gz file is corruption
full path - /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz ( from HDFS )
my question is - what could be the reason that this is is corrupted ?
in spite this cluster is new fresh installation
Created 09-05-2018 01:52 AM
As the file path which you shared is on HDFS : /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz
To identify "corrupt" or "missing" blocks, the command-line command can be used to knwo whether it is healthy or not?
# su - hdfs -c "hdfs fsck /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz" . Connecting to namenode via http://hdfcluster2.example.com:50070/fsck?ugi=hdfs&path=%2Fhdp%2Fapps%2F2.6.4.0-91%2Fspark2%2Fspark2... FSCK started by hdfs (auth:SIMPLE) from /172.22.197.159 for path /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz at Wed Sep 05 01:51:25 UTC 2018 .Status: HEALTHY Total size: 189997800 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 2 (avg. block size 94998900 B) Minimally replicated blocks: 2 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Wed Sep 05 01:51:25 UTC 2018 in 35 milliseconds The filesystem under path '/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz' is HEALTHY
HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion.
The chances of three replicas of the same block becoming damaged is so remote that it would suggest a significant failure somewhere else in the cluster. If this situation does occur, and all three replicas are damaged, then 'hdfs fsck' will report that block as "corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas.
Although there are some Articles which can be referred to fix the "Under replicated Blocks" like:
https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html
How to fix missing/corrupted/under or over-replicated blocks?
https://community.hortonworks.com/content/supportkb/49106/how-to-fix-missingcorruptedunder-or-over-r...
.
Created 09-04-2018 06:24 PM
What kind of corruption is that? file is incomplete or less size that it should be?
Created 09-04-2018 07:33 PM
I cant tell you exaclty but after I tar again the files , this solve my problem
Created 09-05-2018 01:52 AM
As the file path which you shared is on HDFS : /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz
To identify "corrupt" or "missing" blocks, the command-line command can be used to knwo whether it is healthy or not?
# su - hdfs -c "hdfs fsck /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz" . Connecting to namenode via http://hdfcluster2.example.com:50070/fsck?ugi=hdfs&path=%2Fhdp%2Fapps%2F2.6.4.0-91%2Fspark2%2Fspark2... FSCK started by hdfs (auth:SIMPLE) from /172.22.197.159 for path /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz at Wed Sep 05 01:51:25 UTC 2018 .Status: HEALTHY Total size: 189997800 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 2 (avg. block size 94998900 B) Minimally replicated blocks: 2 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Wed Sep 05 01:51:25 UTC 2018 in 35 milliseconds The filesystem under path '/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz' is HEALTHY
HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion.
The chances of three replicas of the same block becoming damaged is so remote that it would suggest a significant failure somewhere else in the cluster. If this situation does occur, and all three replicas are damaged, then 'hdfs fsck' will report that block as "corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas.
Although there are some Articles which can be referred to fix the "Under replicated Blocks" like:
https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html
How to fix missing/corrupted/under or over-replicated blocks?
https://community.hortonworks.com/content/supportkb/49106/how-to-fix-missingcorruptedunder-or-over-r...
.
Created 09-05-2018 05:29 AM
@Jay , very nice solution
until now I was doing this , in ordeer to verify the file
gzip -t /var/tmp/spark2-hdp-yarn-archive.tar.gz
gunzip -c /var/tmp/spark2-hdp-yarn-archive.tar.gz | tar t > /dev/nulltar tzvf spark2-hdp-yarn-archive.tar.gz > /dev/null
Created 09-05-2018 05:33 AM
@Jay in spite
this is diff case , I post yesterday the thred - https://community.hortonworks.com/questions/217423/spark-application-communicating-with-driver-in-he... , can you help me with this ?
Created 09-05-2018 08:06 AM
@Jay . please let me know if I understand it as the following
let say that one of the replica spark2-hdp-yarn-archive.tar.gz , is corrupted
when I run this CLI su - hdfs -c "hdfs fsck /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz"
dose its actually means that fsck will replace the bad one with the good one and status finally will be HEALTHY ?