Support Questions
Find answers, ask questions, and share your expertise

Hdfs not working properly after changing EBS volume of data directory

Solved Go to solution

Hdfs not working properly after changing EBS volume of data directory

Contributor

I have an ambari managed 10 node hdp cluster (2.5.0) deployed in amazon EC2 instance centos 7. I have mounted an EBS volume under /data mount point and configured that as namenode and datanode directories. Everything was working fine. For some reason I have to change the EBS volume. So I followed the below steps.

1- Stop all service from ambari

2- Mount the new EBS volume under /data mount point

3- Restart all amazon ec2 instances

4- Start services using ambari.

After 4 th step my hdfs is not working properly and hence hbase service is also failing.

I am not getting any errors in either datanode or namenode start. And sees the status in ambari as green.

When I do hdfs dfsadmin -report

I get following output.

[hdfs@ip-172-31-29-141 ~]$ hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 131072 (128 KB)
DFS Remaining: 0 (0 B)
DFS Used: 131072 (128 KB)
DFS Used%: 100.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0


-------------------------------------------------
Live datanodes (4):


Name: 172.31.31.118:50010 (ip-172-31-31-118.ec2.internal)
Hostname: ip-172-31-31-118.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018




Name: 172.31.31.114:50010 (ip-172-31-31-114.ec2.internal)
Hostname: ip-172-31-31-114.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018




Name: 172.31.18.247:50010 (ip-172-31-18-247.ec2.internal)
Hostname: ip-172-31-18-247.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018




Name: 172.31.28.137:50010 (ip-172-31-28-137.ec2.internal)
Hostname: ip-172-31-28-137.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018


[hdfs@ip-172-31-29-141 ~]$ hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 131072 (128 KB)
DFS Remaining: 0 (0 B)
DFS Used: 131072 (128 KB)
DFS Used%: 100.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0


-------------------------------------------------
Live datanodes (4):


Name: 172.31.31.118:50010 (ip-172-31-31-118.ec2.internal)
Hostname: ip-172-31-31-118.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018




Name: 172.31.31.114:50010 (ip-172-31-31-114.ec2.internal)
Hostname: ip-172-31-31-114.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018




Name: 172.31.18.247:50010 (ip-172-31-18-247.ec2.internal)
Hostname: ip-172-31-18-247.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018




Name: 172.31.28.137:50010 (ip-172-31-28-137.ec2.internal)
Hostname: ip-172-31-28-137.ec2.internal
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 32768 (32 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Feb 20 16:26:29 UTC 2018


The issue is my hbase service is not starting. Error I get in hbase log file is as follows

8-02-20 11:56:44,465 WARN  [Thread-70] hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/.tmp/hbase.version could only be replicated to 0 nodes instea
d of minReplication (=1).  There are 4 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1649)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB
.java:500)


Also I get the similiar error when I try to put some file in hdfs via command line. Error I get for the command 'hdfs dfs -put ./x2.txt /'

e.hadoop.ipc.RemoteException(java.io.IOException): File /x2.txt._COPYING_ could only be replicated to 0 nodes instead of minReplicatio
n (=1).  There are 4 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1649)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB
.java:500)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtoco
lProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)


        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
        at org.apache.hadoop.ipc.Client.call(Client.java:1496)
        at org.apache.hadoop.ipc.Client.call(Client.java:1396)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:457)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
        at com.sun.proxy.$Proxy11.addBlock(Unknown Source)


What could be causing this ?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hdfs not working properly after changing EBS volume of data directory

Contributor

I figured out the root cause and it solved my issue.

Root cause was the 5th point in this link. Seems after I bring down EBS volume available space I had to decrease the 'Reserved space for HDFS' in ambari hdfs service advanced configuration. This is the dfs.datanode.du.reserved property. This was higher than the available space. Once I brought it down everything is back to normal :)

View solution in original post

1 REPLY 1

Re: Hdfs not working properly after changing EBS volume of data directory

Contributor

I figured out the root cause and it solved my issue.

Root cause was the 5th point in this link. Seems after I bring down EBS volume available space I had to decrease the 'Reserved space for HDFS' in ambari hdfs service advanced configuration. This is the dfs.datanode.du.reserved property. This was higher than the available space. Once I brought it down everything is back to normal :)

View solution in original post