Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Restarting hdfs service on Data nodes

Restarting hdfs service on Data nodes

New Contributor

We have a 12 node cluster. It has been running fine for a while. Recently we are getting the following error while running an MR job.

 

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/cdh_velo/.staging/job_201508271043_255093/job.splitmetainfo could only be replicated to 0 nodes instead of minReplication (=1).  There are 12 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1541)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:667)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

	at org.apache.hadoop.ipc.Client.call(Client.java:1468)
	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
	at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy21.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1544)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:600)

 

I searched for this error online and got a suggestion to add the following configuration to /etc/hadoop/conf/hdfs-site.xml

 

dfs.client.block.write.replace-datanode-on-failure.policy=ALWAYS
dfs.client.block.write.replace-datanode-on-failure.best-effort=true

 

However, after adding this configuration, I am unable to restart the hdfs service because it is not found in /etc/init.d

I dont have access to cloudera manager because the person who installed this has left and no knowledge is given to me. Also we don't have a active licence for Cloudera manager (CM).

 

Is there any way to restart the hdfs service without using CM?

 

I checked if there is any disk space issue, but looks like all data nodes have sufficient space. Here is the report

 

hdfs dfsadmin -report
Configured Capacity: 62111967629312 (56.49 TB)
Present Capacity: 60120702897809 (54.68 TB)
DFS Remaining: 22599624797321 (20.55 TB)
DFS Used: 37521078100488 (34.13 TB)
DFS Used%: 62.41%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

1 REPLY 1

Re: Restarting hdfs service on Data nodes

Champion

Did you or did you not resloved this issue . I might take a look it into it . 

Please let me know . 

 

1 . check if the datanodes are up and runining using command line . 

2 . how is your hadoop temp space looks ? is it runining out of space