Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 878 | 06-04-2025 11:36 PM | |
| 1450 | 03-23-2025 05:23 AM | |
| 728 | 03-17-2025 10:18 AM | |
| 2612 | 03-05-2025 01:34 PM | |
| 1733 | 03-03-2025 01:09 PM |
04-30-2018
08:54 PM
@Michael Bronson Its important to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated? If it's easy enough just to replace the file, that's the route I would take. HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion. The known-good state is determined by checksums which are recorded alongside the block by each DataNode. This will list the corrupt HDFS blocks: hdfs fsck -list-corruptfileblocks This will delete the corrupted HDFS blocks: hdfs fsck / -delete Once you find a file that is corrupt hdfs fsck /path/to/corrupt/file -locations -blocks -files
Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks. You can use the reported block numbers to go around to the DataNodes and the NameNode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, DataNode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again. Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks. Once you determine what happened and you cannot recover any more blocks, just use the below command hdfs fs -rm /path/to/file/with/permanently/missing/blocks
command to get your HDFS filesystem back to healthy so you can start tracking new errors as they occur.
... View more
04-30-2018
10:39 AM
@Michael Bronson Here is how I force my filesystem check for every 3 months, I use the below command below. $ sudo tune2fs -i 3m /dev/sda1 Now verify that newly added filesystem check conditions are set properly. $ sudo tune2fs -l /dev/sda1 Desired output should look like this Last mount time: n/a
Last write time: Sat Mar 10 22:29:24 2018
Mount count: 20
Maximum mount count: 30
Last checked: Fri Mar 2 20:55:08 2018
Check interval: 7776000 (3 months)
Next check after: Sat Jun 2 21:55:08 2018 Hope that answers your question
... View more
04-30-2018
08:37 AM
@Simran Kaur Let's first dive into the core explanation of NameNode(NN) and Secondary NameNode (SNN) by explaining the roles of NN and SNN. NameNode: It contains metadata of datanodes, directory tree structure, fsimage and edit logs related to your entire cluster SecondaryNameNode: It periodically collect fsimage and edit logs from NN and then merge those into new fsimage file, again it will push back into NN to decrease the size of NN metadata. So if NN fails SNN won't receive any updates from NN Your entire cluster go down With the help of SNN you can start another node as NN but SNN won't do NN works. It is just to collect fsimage and editlogs from NameNode Having said that a PRODUCTION cluster should run in a NameNode HA. Where the Active and Standby NameNodes are running on different hosts, racks and also with Network redundancy this ensures Automated Failover: HDP pro-actively detects NameNode host and process failures and will automatically switch to the standby NameNode to maintain availability for the HDFS service. Hot Standby: Both Active and Standby NameNodes have up to date HDFS metadata, ensuring seamless failover even for large clusters – which means no downtime for your HDP cluster! Full Stack Resiliency: The entire HDP stack to handle a NameNode failure scenario without losing data or the job progress. This is vital to ensure long running jobs that are critical to complete on schedule will not be adversely affected during a NameNode failure scenario. Here is aHortonworks documentation Youtube video Please let me know if that helped
... View more
04-29-2018
09:41 PM
@Michael Bronson Any updates so as to close the thread?
... View more
04-29-2018
08:41 AM
@Michael Bronson Here we go Force fsck for root partition The simplest way to force fsck filesystem check on a root partition eg. /dev/sda1 is to create an empty file called forcefsck in the partition's root directory. # touch /forcefsck This empty file will temporarily override any other settings and force fsck to check the filesystem on the next system reboot. Once the filesystem is checked the forcefsck file will be removed thus next time you reboot your filesystem will NOT be checked again. To enable more permanent solution and force filesystem check on every reboot we need to manipulate filesystem's "Maximum mount count" parameter. The following command will ensure that filesystem /dev/sdb1 is checked every time your Linux system reboots. Please note that for this to happen the fsck's PASS value in /etc/fstab # tune2fs -c 1 /dev/sdb1 alternatively, we can set fsck after every 10 reboots: # tune2fs -c 10 /dev/sdb1 Force fsck for all other non-root partitions As opposed to root partition creating empty forcefsck file will NOT trigger partition check on reboot. The only way to force fsck on all other non-root partitions is to manipulate filesystem's "Maximum mount count" parameter and PASS value within /etc/fstab configuration file. To force filesystem check on non-root partition change fsck's PASS value in /etc/fstab to value 2 For example: UUID=c6e22f63-e63c-40ed-bf9b-bb4a10f2db66 /grid01 ext4 errors=remount-ro 0 2 and change maximum mounts filesystem parameter to a positive integer, depending on how many times you wish to allow a specified filesystem to be mounted without being checked. Force fsck on every reboot: # tune2fs -c 1 /dev/sdb1 alternatively, we can set fsck to check filesystem after every 5 reboots: # tune2fs -c 5 /dev/sdb1 To disable fsck run: # tune2fs -c 0 /dev/sdb1 OR # tune2fs -c -1 /dev/sdb1 Which will set the filesystem's "Maximum mount count" parameter to -1 Hope that gives you a walkthrough
... View more
04-29-2018
07:01 AM
@Michael Bronson If this is a production server, it's not a good idea to disable fsck's automatically scheduled checks on boot. fsck automatically runs on boot after M mounts or N days, whichever comes first. You can tune this schedule using tune2fs. I would suggest leaving the automatic check enabled, but using tune2fs to adjust the check schedule if appropriate, and forcing fsck to run when it is more convenient. When fsck runs, it will reset the mount count to 0 and update the Last checked field, effectively rescheduling the next automatic check. If you don't want to run fsck manually but you know it will be convenient on the next scheduled reboot, you can force fsck on the next boot. You can make your system run fsck by creating an empty 'forcefsck' file in the root of your root filesystem. i.e. touch /forcefsck Filesystems that have 0 or nothing specified in the sixth column of your /etc/fstab, will not be checked. Good fsck resource Hope that helps
... View more
04-29-2018
06:09 AM
@Sriram Hadoop Nice to know it has answered your question. Could you Accept the answer I gave by Clicking on Accept button below, That would be a great help to Community users to find the solution quickly for these kinds of errors.
... View more
04-28-2018
08:06 PM
@Michael O The problem is clearly a permission issue "Access Denied" The AWS user is usually ec2-user or ubuntu To add a temporary password to a root user: 1.Connect to your EC2 instance running Linux by using SSH. 2. Assume root user permissions by running the following command: <em>$ sudo su</em> 3. Create a password for the root user by running the following command: <em># passwd root </em> 4. When prompted, enter your temporary root password, and then enter it again to confirm it. Note: You must run this command as the root user. After you complete the task, delete the root password by running the following command: <em># passwd –d root</em> Hope that helps
... View more
04-28-2018
04:38 PM
@KRITIKA RAI Indeed I see a lot of warnings but the job completed successfully, below is an extract of your log. 18/04/25 16:02:18 INFO mapreduce.Job: map 0% reduce 0%
18/04/25 16:02:28 INFO mapreduce.Job: map 25% reduce 0%
18/04/25 16:02:30 INFO mapreduce.Job: map 50% reduce 0%
18/04/25 16:02:31 INFO mapreduce.Job: map 100% reduce 0%
18/04/25 16:02:32 INFO mapreduce.Job: Job job_1524672099436_0001 completed successfully Can you run a Select * from categories_hive Please revert
... View more
04-27-2018
12:48 PM
@Sriram Hadoop Once you have changed the block size at the cluster level, whatever files you put or copy to hdfs will have the new default block size of 256 MB Unfortunately, apart from DISTCP you have the usual -put and -get HDFS commands My default blocksize is 128MB see attached screenshot 128MB.JPG Created a file test_128MB.txt $ vi test_128MB.txt Uploaded a 128 MB files to HDFS $ hdfs dfs -put test_128MB.txt /user/sheltong see attached screenshot 128MB.JPG notice the block size I then copied the same file back to the local filesystem, $ hdfs dfs -get /user/sheltong/test_128MB.txt /tmp/test_128MB_2.txt The using the -D option to define a new blocksize of 256 MB $ hdfs dfs -D dfs.blocksize=268435456 -put test_128MB_2.txt /user/sheltong See screenshot 256MB.JPG, technically its possible if you have a few files but you should remember the test_128MB.txt and test_128MB_2.txt are the same files of 128MB, so changing the blocksize of an existing files with try to fit a 128bock in a 256 MB block leading to wastage of space of the other 128MB, hence the reason it will ONLY apply to new files. Hope that gives you a better understanding
... View more