Created 11-24-2016 12:12 PM
I did a hdfs balance in Ambari 2.4.1 HDP2.5.
stdout: /var/lib/ambari-agent/data/output-4313.txt , it showed finished
[balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.130:50010 [balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.136:50010 [balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.132:50010 [balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.134:50010 [balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.126:50010 [balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.138:50010 [balancer] 16/11/23 21:00:18 INFO net.NetworkTopology: Adding a new node: /default/202.1.2.137:50010 [balancer] 16/11/23 21:00:18 INFO balancer.Balancer: 0 over-utilized: [] [balancer] 16/11/23 21:00:18 INFO balancer.Balancer: 0 underutilized: [] [balancer] The cluster is balanced. Exiting...[balancer] Process is finished[balancer] [balancer] Process is finished[balancer] Nov 23, 2016 9:00:18 PM [balancer] Process is finished[balancer] [balancer] Process is finished[balancer] 0[balancer] Process is finished[balancer] 0 B 0 B 0 B [balancer] Process is finished[balancer] Nov 23, 2016 9:00:18 PM [balancer] Process is finished[balancer] Balancing took 1.977 seconds [balancer] Process is finished
But when I attempted a new balance one day later, it is said
16/11/23 21:41:18 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.append over null. Not retrying because try once and fail. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /system/balancer.id for DFSClient_NONMAPREDUCE_-557837852_1 on 202.1.2.132 because this file lease is currently owned by DFSClient_NONMAPREDUCE_876034774_1 on 202.1.2.132 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3019) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552) at org.apache.hadoop.ipc.Client.call(Client.java:1496) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.append(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:343) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy12.append(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1818) at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1887) at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1857) at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:368) at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:364) at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:345) at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1183) at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.checkAndMarkRunning(NameNodeConnector.java:242) at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.<init>(NameNodeConnector.java:143) at org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.newNameNodeConnectors(NameNodeConnector.java:76) at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:670) at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:793) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:922) java.io.IOException: Another Balancer is running.. Exiting ... Nov 23, 2016 9:41:18 PM Balancing took 2.224 seconds
I checked
[hdfs@insightcluster132 hadoop-hdfs]$ jps 147930 Jps 5637 DataNode 169637 Balancer 5851 NameNode [hdfs@insightcluster132 hadoop-hdfs]$ ps -ef |grep 169637 hdfs 148939 94043 0 20:09 pts/4 00:00:00 grep --color=auto 169637 hdfs 169637 1 0 Nov23 ? 00:06:00 /usr/local/java/jdk1.8.0_101/bin/java -Dproc_balancer -Xmx8192m -Dhdp.version=2.5.0.0-1245 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.0.0-1245 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xmx8192m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.hdfs.server.balancer.Balancer -threshold 2
and
[hdfs@insightcluster132 hadoop-hdfs]$ hdfs dfs -ls /systemFound 1 items-rw-r--r-- 3 hdfs hdfs 28 2016-11-23 21:01 /system/balancer.id
How to process this? Can quit this situation by just kill 169637 and delete the file /system/balancer.id
?
Thanks
Created 11-26-2016 06:40 PM
Yes , you can safely kill the pid and delete the pid file from hdfs.
Created 11-26-2016 06:40 PM
Yes , you can safely kill the pid and delete the pid file from hdfs.
Created 11-29-2016 08:50 AM
Thanks!!!!