Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

HDFS balancer finished, but not exit

avatar
Expert Contributor

I did a hdfs balance in Ambari 2.4.1 HDP2.5.

stdout: /var/lib/ambari-agent/data/output-4313.txt , it showed finished

[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.130:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.136:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.132:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.134:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.126:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.138:50010
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.137:50010
[balancer] 16/11/23 21:00:18 INFO balancer.Balancer: 0 over-utilized: []
[balancer] 16/11/23 21:00:18 INFO balancer.Balancer: 0 underutilized: []
[balancer] The cluster is balanced. Exiting...[balancer] Process is finished[balancer] 
[balancer] Process is finished[balancer] Nov 23, 2016 9:00:18 PM [balancer] Process is finished[balancer]  [balancer] Process is finished[balancer]          0[balancer] Process is finished[balancer]                   0 B                 0 B                0 B
[balancer] Process is finished[balancer] Nov 23, 2016 9:00:18 PM  [balancer] Process is finished[balancer] Balancing took 1.977 seconds
[balancer] Process is finished

But when I attempted a new balance one day later, it is said

16/11/23 21:41:18 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.append over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /system/balancer.id for DFSClient_NONMAPREDUCE_-557837852_1 on 202.1.2.132 because this file lease is currently owned by DFSClient_NONMAPREDUCE_876034774_1 on 202.1.2.132
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3019)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
	at org.apache.hadoop.ipc.Client.call(Client.java:1496)
	at org.apache.hadoop.ipc.Client.call(Client.java:1396)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at com.sun.proxy.$Proxy11.append(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:343)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
	at com.sun.proxy.$Proxy12.append(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1818)
	at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1887)
	at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1857)
	at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:368)
	at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:364)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:364)
	at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:345)
	at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1183)
	at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.checkAndMarkRunning(NameNodeConnector.java:242)
	at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.<init>(NameNodeConnector.java:143)
	at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.newNameNodeConnectors(NameNodeConnector.java:76)
	at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:670)
	at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:793)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:922)
java.io.IOException: Another Balancer is running..  Exiting ...
Nov 23, 2016 9:41:18 PM  Balancing took 2.224 seconds

I checked

[hdfs@insightcluster132 hadoop-hdfs]$ jps
147930 Jps
5637 DataNode
169637 Balancer
5851 NameNode
[hdfs@insightcluster132 hadoop-hdfs]$ ps -ef |grep 169637
hdfs     148939  94043  0 20:09 pts/4    00:00:00 grep --color=auto 169637
hdfs     169637      1  0 Nov23 ?        00:06:00 /usr/local/java/jdk1.8.0_101/bin/java -Dproc_balancer -Xmx8192m -Dhdp.version=2.5.0.0-1245 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.0.0-1245 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xmx8192m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.hdfs.server.balancer.Balancer -threshold 2

and

[hdfs@insightcluster132 hadoop-hdfs]$ hdfs dfs -ls /systemFound 1 items-rw-r--r--  3 hdfs hdfs  28 2016-11-23 21:01 /system/balancer.id

How to process this? Can quit this situation by just kill 169637 and delete the file /system/balancer.id

?

Thanks

1 ACCEPTED SOLUTION

avatar
@Huahua Wei

Yes , you can safely kill the pid and delete the pid file from hdfs.

View solution in original post

2 REPLIES 2

avatar
@Huahua Wei

Yes , you can safely kill the pid and delete the pid file from hdfs.

avatar
Expert Contributor

Thanks!!!!