Created on 06-02-2017 07:07 PM - edited 08-17-2019 11:32 PM
Can someone help me understand an error in PutHDFS?
I currently have it set up to read from a Kafka topic and do transformations on it (including SplitText which seems to cause the problem because if I run this flow without SplitText it doesnt have this PutHDFS error - the purpose of SplitText is to prevent bleeding of events from one file into another incorrect file (I have them separated by minute)) I have included screenshots of the flow, config of SplitText and PutHDFS and the error along with below is the actual log from nifi-app.log
2017-06-02 18:58:03,870 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Successfully performed Expiration Action org.apache.nifi.provenance.expiration.FileRemovalAction@1ab1bf76 on Provenance Event file ./provenance_repository/68922860.prov.gz in 4 millis 2017-06-02 18:58:04,375 ERROR [Timer-Driven Process Thread-140] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950] Failed to write to HDFS due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /topics/minifitest/cis-prod/logger.prod.aqx.cequintecid.com/home.elloyd.log-gen/2017/06/02/18/2017_06_02_18_57.log for DFSClient_NONMAPREDUCE_2044289469_247 on 10.10.2.116 because DFSClient_NONMAPREDUCE_2044289469_247 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2970) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) : org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /topics/minifitest/cis-prod/logger.prod.aqx.cequintecid.com/home.elloyd.log-gen/2017/06/02/18/2017_06_02_18_57.log for DFSClient_NONMAPREDUCE_2044289469_247 on 10.10.2.116 because DFSClient_NONMAPREDUCE_2044289469_247 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2970) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
Created 06-05-2017 01:14 AM
Hi @Eric Lloyd
I found HDFS-11367 which was reported for the similar issue you encountered.
By reading that JIRA and checking NiFi PutHDFS processor code that calls OutputStream.close method, I suspect there had been other exception such as TimeoutException and it causes AlreadyBeingCreatedException.
I see you have configured the 'Create Filename (UpdateAttribute)' processor to have timestamp in filename as minutes resolution, and PutHDFS with 'Append' conflict resolution mode. This might overwhelm HDFS by executing lots of append requests.
I'd suggest setting MergeContent with longer interval at its 'Run Schedule' such as 1 min or 30 seconds so that NiFi merge contents locally and perform less append operations. It would be not only safer but also more performant.
Created 06-05-2017 01:14 AM
Hi @Eric Lloyd
I found HDFS-11367 which was reported for the similar issue you encountered.
By reading that JIRA and checking NiFi PutHDFS processor code that calls OutputStream.close method, I suspect there had been other exception such as TimeoutException and it causes AlreadyBeingCreatedException.
I see you have configured the 'Create Filename (UpdateAttribute)' processor to have timestamp in filename as minutes resolution, and PutHDFS with 'Append' conflict resolution mode. This might overwhelm HDFS by executing lots of append requests.
I'd suggest setting MergeContent with longer interval at its 'Run Schedule' such as 1 min or 30 seconds so that NiFi merge contents locally and perform less append operations. It would be not only safer but also more performant.
Created 06-07-2017 04:03 PM
Thanks for the help.
Your estimate on the Run Schedule was a bit high though. When I changed it to even 30 seconds, it bottlenecked badly right before MergeContent. You were right though - when I lowered it to 1 sec, I have very little bottleneck and the error is gone.