Created 01-21-2016 12:35 PM
Hi,
during installation procedure of a cluster I was facing some hw issues, so that at the end I now have a (almost) running cluster but with corrupt file blocks.
HDFS service is up and running in HA mode but it is complaining about corrupt blocks:
FSCK started by hdfs (auth:SIMPLE) from /10.41.27.10 for path / at Thu Jan 21 13:22:00 CET 2016 .............. /hdp/apps/2.2.4.2-2/hive/hive.tar.gz: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073741862 /hdp/apps/2.2.4.2-2/hive/hive.tar.gz: MISSING 1 blocks of total size 83000677 B.. /hdp/apps/2.2.4.2-2/mapreduce/hadoop-streaming.jar: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073741863 /hdp/apps/2.2.4.2-2/mapreduce/hadoop-streaming.jar: MISSING 1 blocks of total size 104996 B.. /hdp/apps/2.2.4.2-2/mapreduce/mapreduce.tar.gz: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073741827 /hdp/apps/2.2.4.2-2/mapreduce/mapreduce.tar.gz: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073741829 /hdp/apps/2.2.4.2-2/mapreduce/mapreduce.tar.gz: MISSING 2 blocks of total size 192697367 B.. /hdp/apps/2.2.4.2-2/pig/pig.tar.gz: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073741861 /hdp/apps/2.2.4.2-2/pig/pig.tar.gz: MISSING 1 blocks of total size 97548644 B.. /hdp/apps/2.2.4.2-2/tez/tez.tar.gz: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073741826 /hdp/apps/2.2.4.2-2/tez/tez.tar.gz: MISSING 1 blocks of total size 40658186 B.. /mr-history/done/2016/01/08/000000/job_1452263100546_0003-1452263260432-ambari%2Dqa-PigLatin%3ApigSmoke.sh-1452263277399-1-0-SUCCEEDED-default-1452263269870.jhist: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073742129 ... /user/ambari-qa/passwd: MISSING 1 blocks of total size 2637 B... /user/ambari-qa/pigsmoke.out/part-v000-o000-r-00000: CORRUPT blockpool BP-1565025838-10.41.27.10-1452263064113 block blk_1073742141 /user/ambari-qa/pigsmoke.out/part-v000-o000-r-00000: MISSING 1 blocks of total size 358 B.Status: CORRUPT Total size: 414892275 B Total dirs: 7291 Total files: 38 Total symlinks: 0 Total blocks (validated): 35 (avg. block size 11854065 B) ******************************** CORRUPT FILES: 23 MISSING BLOCKS: 24 MISSING SIZE: 414887859 B CORRUPT BLOCKS: 24 ******************************** Minimally replicated blocks: 11 (31.428572 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 0.62857145 Corrupt blocks: 24 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Thu Jan 21 13:22:00 CET 2016 in 157 milliseconds The filesystem under path '/' is CORRUPT
What I want to do now is to re-format HDFS to start with a blank HDFS, since it is a new installation and no data has been uploaded to HDFS.
How can I properly re-format HDFS to get rid of the corrupt blocks ?
I am afraid of deleting just the files it is complaining about, because if I delete e.g. /hdp/apps/2.2.4.2-2/hive/hive.tar.gz will it be re-deployed at restarting services or how will those .gz and .jar's will be provided afterwards ?!?!
Created 01-21-2016 12:48 PM
@Gerd Koenig For blank copy
hadoop namenode -format ( Don't use this in production or any env in use)
Now, re: Corrupt block -- see this http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs
Now, challenge is HA - I suggest to open a support case if you have access to support
Created 01-21-2016 12:48 PM
@Gerd Koenig For blank copy
hadoop namenode -format ( Don't use this in production or any env in use)
Now, re: Corrupt block -- see this http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs
Now, challenge is HA - I suggest to open a support case if you have access to support
Created 01-21-2016 02:59 PM
@Gerd Koenig Open the support ticket to handle this ...I would be doing the same if I am in your shoes.
Created 01-21-2016 03:17 PM
Thanks @Neeraj .
Just to give you feedback of another 'solution'. In the meantime I received two more datanodes back (which were failing during installation time). After adding those hosts and restarting HDFS the corrupt block error disappeared without any further file deletion or HDFS re-formatting
Regards, Gerd
Created 01-21-2016 02:36 PM
additionally, the files you're concerned with are distributed with our distribution, you can find them in /usr/hdp directory on your local filesystem.
Created 01-21-2016 02:45 PM
@Gerd Koenig If you reformat hdfs you will be left without the whole /hdp folder and you'll have to recreate it. If you are sure everything else is now all right you better remove corrupted files and recreate them, they are all available in /usr/hdp/<hdp-version> and you can copy them to hdfs. Details can be found in the doc given by @Neeraj Sabharwal. For example, hive and pig files are given here, tez files here and so on. You can just delete files under /user/ambari-qa, they are result of some service checks, no need to recreate them.
Created 01-21-2016 02:55 PM
DO NOT REFORMAT for missing blocks. If its not a test cluster, you need to identify how you ended up with missing blocks. One possible reason if you changed the data directories and removed some. If you identified the root cause and fine with it, just get the files missing from local and update into hdfs. And you can just delete the files in /user/ambari-qa that you listed.