Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1953 | 06-15-2020 05:23 AM | |
| 15901 | 01-30-2020 08:04 PM | |
| 2097 | 07-07-2019 09:06 PM | |
| 8195 | 01-27-2018 10:17 PM | |
| 4652 | 12-31-2017 10:12 PM |
04-25-2018
06:28 PM
hi all, we have ambari cluster ( HDP version 2.6.0.1 ) one of the datanode machine - ( worker12 machine) , have a disk - /dev/sdf with File-system errors we notice about that from - e2fsck -n /dev/sdf 1. according to the output from the e2fsck , is it safty to run e2fsck -y /dev/sdf in order to repair a disk /dev/sdf file-system ? 2. is it necessary to do some other steps after running - e2fsck -y /dev/sdf ? ls /grid/sdf/hadoop/
hdfs/ yarn/
e2fsck -n /dev/sdf
e2fsck 1.42.9 (28-Dec-2013)
Warning! /dev/sdf is in use.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sdf contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? no
Inode 176619732 was part of the orphaned inode list. IGNORED.
Inode 176619733 was part of the orphaned inode list. IGNORED.
Inode 176619745 was part of the orphaned inode list. IGNORED.
Inode 176619747 was part of the orphaned inode list. IGNORED.
Inode 176619751 was part of the orphaned inode list. IGNORED.
Inode 176619752 was part of the orphaned inode list. IGNORED.
Inode 176619753 was part of the orphaned inode list. IGNORED.
Inode 176619756 was part of the orphaned inode list. IGNORED.
Inode 176619759 was part of the orphaned inode list. IGNORED.
Inode 176619760 was part of the orphaned inode list. IGNORED.
Inode 176619762 was part of the orphaned inode list. IGNORED.
Inode 176619763 was part of the orphaned inode list. IGNORED.
Inode 176619766 was part of the orphaned inode list. IGNORED.
Inode 176619767 was part of the orphaned inode list. IGNORED.
Inode 176619773 was part of the orphaned inode list. IGNORED.
Inode 176619774 was part of the orphaned inode list. IGNORED.
Inode 176619775 was part of the orphaned inode list. IGNORED.
Deleted inode 176619779 has zero dtime. Fix? no
Inode 176619781 was part of the orphaned inode list. IGNORED.
Inode 176619786 was part of the orphaned inode list. IGNORED.
Inode 176619788 was part of the orphaned inode list. IGNORED.
Inode 176619799 was part of the orphaned inode list. IGNORED.
Inode 176619800 was part of the orphaned inode list. IGNORED.
Pass 2: Checking directory structure
Entry '00' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619732. Clear? no
Entry '16' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619733. Clear? no
Entry '17' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619745. Clear? no
Entry '21' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619747. Clear? no
Entry '2e' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619762. Clear? no
Entry '1f' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619763. Clear? no
Entry '19' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619775. Clear? no
Entry '35' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619779. Clear? no
Entry '09' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-8248ef4a-78f5-4f43-967d-0007096d0c0b (176554376) has deleted/unused inode 176619788. Clear? no
Entry '34' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-c7b71625-3667-48e4-8843-8ddf3c6cc98c (176554456) has deleted/unused inode 176619752. Clear? no
Entry '04' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-c7b71625-3667-48e4-8843-8ddf3c6cc98c (176554456) has deleted/unused inode 176619756. Clear? no
Entry '0f' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-c7b71625-3667-48e4-8843-8ddf3c6cc98c (176554456) has deleted/unused inode 176619799. Clear? no
Entry '3b' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619751. Clear? no
Entry '3c' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619753. Clear? no
Entry '1f' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619759. Clear? no
Entry '15' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619760. Clear? no
Entry '14' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619766. Clear? no
Entry '01' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619767. Clear? no
Entry '27' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619773. Clear? no
Entry '35' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619774. Clear? no
Entry '0c' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619781. Clear? no
Entry '09' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619786. Clear? no
Entry '31' in /hadoop/yarn/local/usercache/hive/appcache/application_1523380874382_1834/blockmgr-5a61cab7-acb9-497a-9d7b-e6d6b29235ed (176554463) has deleted/unused inode 176619800. Clear? no
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 176554376 ref count is 63, should be 54. Fix? no
Inode 176554456 ref count is 63, should be 60. Fix? no
Inode 176554463 ref count is 64, should be 53. Fix? no
Pass 5: Checking group summary information
Block bitmap differences: -(1412960478--1412960479) -1412960491 -1412960493 -(1412960497--1412960499) -1412960502 -(1412960505--1412960506) -(1412960508--1412960509) -(1412960512--1412960513) -(1412960519--1412960521) -1412960525 -1412960527 -1412960532 -1412960534 -(1412960545--1412960546)
Fix? no
Free blocks count wrong (1918728678, counted=1919005864).
Fix? no
Inode bitmap differences: -(176619732--176619733) -176619745 -176619747 -(176619751--176619753) -176619756 -(176619759--176619760) -(176619762--176619763) -(176619766--176619767) -(176619773--176619775) -176619779 -176619781 -176619786 -176619788 -(176619799--176619800)
Fix? no
Directories count wrong for group #43120 (245, counted=222).
Fix? no
Free inodes count wrong (243908566, counted=243908282).
Fix? no
/dev/sdf: ********** WARNING: Filesystem still has errors **********
/dev/sdf: 282666/244191232 files (0.3% non-contiguous), 34777968/1953506646 blocks
... View more
Labels:
04-24-2018
08:02 AM
Dear Geoffrey , as you know before performing fsck /dev/sdc , we must umount /grid/sdc , or umount -l /grid/sdc , only then we can run fsck /dev/sdc , so can you approve finally the following steps: 1. umount /grid/sdc or umount -l /grid/sdc in case devise is busy 2. fsck /dev/sdc ?
... View more
04-24-2018
04:50 AM
Once you find a file that is corrupt
hdfs fsck /path/to/corrupt/file -locations -blocks -files
... View more
04-24-2018
04:48 AM
what you think about the following steps to fix corrupted files ( I take it from - https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files ) to determine which files are having problems , this ignores lines with nothing but dots and lines talking about replication.
hdfs fsck / | egrep -v '^\.+$' | grep -v eplica
... View more
04-24-2018
04:34 AM
Dear Geoffrey , /grid/sdc hold HDFS filesystem , dose fsck on that disk not risky? , see also - http://fibrevillage.com/storage/658-how-to-use-hdfs-fsck-command-to-identify-corrupted-files
... View more
04-23-2018
07:39 PM
Dear Geoffrey , we do resboot twice before weeks , but this not help ( when we reboot we do actually remount , about dfs.datanode.failed.volumes.tolerated" set it to 1 , we want to set it to 0 ( we not want loose one disk )
... View more
04-23-2018
05:09 PM
we have ambari cluster HDP version 2.6.0.1 we have issues on worker02 according to the log - hadoop-hdfs-datanode-worker02.sys65.com.log, 2018-04-21 09:02:53,405 WARN checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /grid/sdc/hadoop/hdfs/data note - from ambari GUI we can see that Data-node on worker02 is down we can see from the log - Directory is not writable: /grid/sdc/hadoop/hdfs/data the follwing: STARTUP_MSG: Starting DataNode
STARTUP_MSG: user = hdfs
STARTUP_MSG: host = worker02.sys65.com/23.87.23.126
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.7.3.2.6.0.3-8
STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r c6befa0f1e911140cc815e0bab744a6517abddae; compiled by 'jenkins' on 2017-04-01T21:32Z
STARTUP_MSG: java = 1.8.0_112
************************************************************/
2018-04-21 09:02:52,854 INFO datanode.DataNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2018-04-21 09:02:53,321 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdb/hadoop/hdfs/data/
2018-04-21 09:02:53,330 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdc/hadoop/hdfs/data/
2018-04-21 09:02:53,330 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdd/hadoop/hdfs/data/
2018-04-21 09:02:53,331 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sde/hadoop/hdfs/data/
2018-04-21 09:02:53,331 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdf/hadoop/hdfs/data/
2018-04-21 09:02:53,405 WARN checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /grid/sdc/hadoop/hdfs/data
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:124)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:99)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:128)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:44)
at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:127)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-04-21 09:02:53,410 ERROR datanode.DataNode (DataNode.java:secureMain(2691)) - Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 4, volumes configured: 5, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:216)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2583)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2492)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2684)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2708)
2018-04-21 09:02:53,411 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-04-21 09:02:53,414 INFO datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at worker02.sys65.com/23.87.23.126
************************************************************/
<br> we checked that: 1. all files and folders under - /grid/sdc/hadoop/hdfs/ are with hdfs:hadoop , and that is OK 2. disk - sdc is read and write (rw,noatime,data=ordered) , and that is OK we suspect that Hard Disk has gone bad , in this case how we check that? please advice what the other options to resolve this issue ?
... View more
Labels:
04-18-2018
02:00 PM
hi, the problem is that this API not works on HDP - 2.6.4 , and ambari 2.6.1 , we installed many clusters but when we try this API on mentions version , then API not set the repo from some unclear reason
... View more
04-16-2018
02:44 PM
@Jay do you have suggestion how to install Master + worker + kafka on single node ?
... View more