Support Questions

Find answers, ask questions, and share your expertise

HDFS balancer finished, but not exit

avatar
Expert Contributor

I did a hdfs balance in Ambari 2.4.1 HDP2.5.

stdout: /var/lib/ambari-agent/data/output-4313.txt , it showed finished

[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.130:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.136:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.132:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.134:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.126:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.138:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.137:50010
[balancer] 16/11/23 21:00:18 INFO balancer.Balancer: 0 over-utilized: []
[balancer] 16/11/23 21:00:18 INFO balancer.Balancer: 0 underutilized: []
[balancer] The cluster is balanced. Exiting...[balancer] Process is finished[balancer] 
[balancer] Process is finished[balancer] Nov 23, 2016 9:00:18 PM [balancer] Process is finished[balancer]  [balancer] Process is finished[balancer]          0[balancer] Process is finished[balancer]                   0 B                 0 B                0 B
[balancer] Process is finished[balancer] Nov 23, 2016 9:00:18 PM  [balancer] Process is finished[balancer] Balancing took 1.977 seconds
[balancer] Process is finished

But when I attempted a new balance one day later, it is said

16/11/23 21:41:18 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.append over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /system/balancer.id for DFSClient_NONMAPREDUCE_-557837852_1 on 202.1.2.132 because this file lease is currently owned by DFSClient_NONMAPREDUCE_876034774_1 on 202.1.2.132
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3019)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
	at org.apache.hadoop.ipc.Client.call(Client.java:1496)
	at org.apache.hadoop.ipc.Client.call(Client.java:1396)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at com.sun.proxy.$Proxy11.append(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:343)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
	at com.sun.proxy.$Proxy12.append(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1818)
	at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1887)
	at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1857)
	at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:368)
	at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:364)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:364)
	at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:345)
	at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1183)
	at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.checkAndMarkRunning(NameNodeConnector.java:242)
	at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.<init>(NameNodeConnector.java:143)
	at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.newNameNodeConnectors(NameNodeConnector.java:76)
	at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:670)
	at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:793)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:922)
java.io.IOException: Another Balancer is running..  Exiting ...
Nov 23, 2016 9:41:18 PM  Balancing took 2.224 seconds

I checked

[hdfs@insightcluster132 hadoop-hdfs]$ jps
147930 Jps
5637 DataNode
169637 Balancer
5851 NameNode
[hdfs@insightcluster132 hadoop-hdfs]$ ps -ef |grep 169637
hdfs     148939  94043  0 20:09 pts/4    00:00:00 grep --color=auto 169637
hdfs     169637      1  0 Nov23 ?        00:06:00 /usr/local/java/jdk1.8.0_101/bin/java -Dproc_balancer -Xmx8192m -Dhdp.version=2.5.0.0-1245 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.0.0-1245 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xmx8192m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.hdfs.server.balancer.Balancer -threshold 2

and

[hdfs@insightcluster132 hadoop-hdfs]$ hdfs dfs -ls /systemFound 1 items-rw-r--r--  3 hdfs hdfs  28 2016-11-23 21:01 /system/balancer.id

How to process this? Can quit this situation by just kill 169637 and delete the file /system/balancer.id

?

Thanks

1 ACCEPTED SOLUTION

avatar
@Huahua Wei

Yes , you can safely kill the pid and delete the pid file from hdfs.

View solution in original post

2 REPLIES 2

avatar
@Huahua Wei

Yes , you can safely kill the pid and delete the pid file from hdfs.

avatar
Expert Contributor

Thanks!!!!