Support Questions

Find answers, ask questions, and share your expertise

datanode + Directory is not writable

avatar

we have ambari cluster HDP version 2.6.0.1

we have issues on worker02 according to the log - hadoop-hdfs-datanode-worker02.sys65.com.log,

2018-04-21 09:02:53,405 WARN  checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /grid/sdc/hadoop/hdfs/data

note - from ambari GUI we can see that Data-node on worker02 is down

we can see from the log - Directory is not writable: /grid/sdc/hadoop/hdfs/data the follwing:


STARTUP_MSG: Starting DataNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = worker02.sys65.com/23.87.23.126
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3.2.6.0.3-8
STARTUP_MSG:   build = git@github.com:hortonworks/hadoop.git -r c6befa0f1e911140cc815e0bab744a6517abddae; compiled by 'jenkins' on 2017-04-01T21:32Z
STARTUP_MSG:   java = 1.8.0_112
************************************************************/
2018-04-21 09:02:52,854 INFO  datanode.DataNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2018-04-21 09:02:53,321 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdb/hadoop/hdfs/data/
2018-04-21 09:02:53,330 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdc/hadoop/hdfs/data/
2018-04-21 09:02:53,330 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdd/hadoop/hdfs/data/
2018-04-21 09:02:53,331 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sde/hadoop/hdfs/data/
2018-04-21 09:02:53,331 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for [DISK]file:/grid/sdf/hadoop/hdfs/data/
2018-04-21 09:02:53,405 WARN  checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /grid/sdc/hadoop/hdfs/data
	at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:124)
	at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:99)
	at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:128)
	at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:44)
	at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:127)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2018-04-21 09:02:53,410 ERROR datanode.DataNode (DataNode.java:secureMain(2691)) - Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 4, volumes configured: 5, volumes failed: 1, volume failures tolerated: 0
	at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:216)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2583)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2492)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2539)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2684)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2708)
2018-04-21 09:02:53,411 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-04-21 09:02:53,414 INFO  datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at worker02.sys65.com/23.87.23.126
************************************************************/
<br>

we checked that:

1. all files and folders under - /grid/sdc/hadoop/hdfs/ are with hdfs:hadoop , and that is OK

2. disk - sdc is read and write (rw,noatime,data=ordered) , and that is OK

we suspect that Hard Disk has gone bad , in this case how we check that?

please advice what the other options to resolve this issue ?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

The disk is already unusable go-ahead run fsck with a -y option to repair it 🙂 see above

Either way you will have to replace that dirty disk anyways!

View solution in original post

14 REPLIES 14

avatar
Master Mentor

@Michael Bronson

Any updates?

avatar

Hi, Geoffrey

we just waiting for your approval about the following steps:

1. umount /grid/sdc or umount -l /grid/sdc in case devise is busy

2. fsck -y /dev/sdc

3. mount /grid/sdc

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Avahi is a system which facilitates service discovery on a local network via the mDNS/DNS-SD protocol suite. This enables you to plug your laptop or computer into a network and instantly be able to view other people who you can chat with, find printers to print to or find files being shared. Compatible technology is found in Apple MacOS X (branded Bonjour and sometimes Zeroconf)

The two big benefits of Avahi are name resolution & finding printers, but on a server, in a managed environment, it's of little value.

unmounting and mount filesystems are a common thing especially in Hadoop clusters, your SysOps team should have validated that, but all looks correct to me.

Do a dry run with the below code to see what will be affected that will give you a better picture.

# e2fsck -n /dev/sdc 

The data will be reconstructed as you have default replication factor you can later rebalance the HDFS data

avatar

yes we already did it on one of the disks , see please - https://community.hortonworks.com/questions/189016/datanode-machine-worker-one-of-the-disks-have-fil...

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

The disk is already unusable go-ahead run fsck with a -y option to repair it 🙂 see above

Either way you will have to replace that dirty disk anyways!