About elloyd

elloyd · ‎06-12-2017

Thanks Matt.

elloyd · ‎06-09-2017

Hello this is a simple question. Does anyone know why we cannot access an input port in a process group? I have a Remote Process Group trying to access an input port on another server that I have located in a Process Group but it doesn't appear when I try to connect to it with the connection to the Remote Process Group. It'd be nice, since we will have many servers connecting to many Input Ports, to have them divided up into different process groups instead of all sprawled out on the main Nifi root interface.

elloyd · ‎06-07-2017

Thanks for the help. Your estimate on the Run Schedule was a bit high though. When I changed it to even 30 seconds, it bottlenecked badly right before MergeContent. You were right though - when I lowered it to 1 sec, I have very little bottleneck and the error is gone.

elloyd · ‎06-02-2017

Can someone help me understand an error in PutHDFS? I currently have it set up to read from a Kafka topic and do transformations on it (including SplitText which seems to cause the problem because if I run this flow without SplitText it doesnt have this PutHDFS error - the purpose of SplitText is to prevent bleeding of events from one file into another incorrect file (I have them separated by minute)) I have included screenshots of the flow, config of SplitText and PutHDFS and the error along with below is the actual log from nifi-app.log 2017-06-02 18:58:03,870 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Successfully performed Expiration Action org.apache.nifi.provenance.expiration.FileRemovalAction@1ab1bf76 on Provenance Event file ./provenance_repository/68922860.prov.gz in 4 millis 2017-06-02 18:58:04,375 ERROR [Timer-Driven Process Thread-140] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950] Failed to write to HDFS due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /topics/minifitest/cis-prod/logger.prod.aqx.cequintecid.com/home.elloyd.log-gen/2017/06/02/18/2017_06_02_18_57.log for DFSClient_NONMAPREDUCE_2044289469_247 on 10.10.2.116 because DFSClient_NONMAPREDUCE_2044289469_247 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2970) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) : org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /topics/minifitest/cis-prod/logger.prod.aqx.cequintecid.com/home.elloyd.log-gen/2017/06/02/18/2017_06_02_18_57.log for DFSClient_NONMAPREDUCE_2044289469_247 on 10.10.2.116 because DFSClient_NONMAPREDUCE_2044289469_247 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2970) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230 7)

elloyd · ‎05-10-2017

Thanks Bryan that article was very helpful in understanding. We resolved this error by increasing our Maximum Timer Driven Thread Count and Maximum Event Drive Thread Count in the general settings of the Nifi instance which is consuming. We are currently only testing so each topic only has 1 partition and for each topic we are using a ConsumeKafka which has 5 Concurrent Tasks so according to your article, we should have a surplus of tasks. The error from this issue has been resolved but now we experience data loss when retrieving from more than 2 hosts. I guess that is the topic for another question though.

elloyd · ‎05-09-2017

Its important to note here that when we only collected from two hosts, it worked just fine.

elloyd · ‎05-09-2017

Hello, here is our setup. We have 4 servers that are sending data through PublishKafka_0_10 to a different topic each. We have 1 receiving server that uses ConsumeKafka_0_10 in FOUR flows, one for each sending server for each topic. (see screenshots) We are trying to parse out the events by changing the filename to 2017_05_09_15_topic.log We are getting the error shown in the screenshot related to commit being failed in ConsumeKafka since the group has already rebalanced and it suggests increasing session timeout or reducing maximum size of batches returned with max.poll.records. We are under the impression that: session timeout = Kafka configurable property: offsets.commit.timeout.ms which we changed from 5000 -> 25000 and max.poll.records is the ConsumeKafka property: Max Poll Records which we have changed from 10,000 -> 1,000 We have also tried tuning the Max Uncommitted Time both higher than 3 secs and lower than 3 secs. What we are seeing is we are sometimes missing data, sometime getting data duplicates and the filenames are odd... it will have the correct format for them but then it will be missing data that will be put into a file in the format: 2017_05_09__topic.log (which is missing the minutes) I realize theres probably alot of issues here ... the inquiry of this question is more based around understanding the error we received, finding out if our modifications to the parameters it mentions are the correct parameters we are modifying (which I suspect they aren't) and possible solutions. Thank you.

elloyd · ‎05-04-2017

Changing the Concurrent Tasks in ExtractText to 3 and reducing the Run Duration to 500ms fixed the problem.

elloyd · ‎05-04-2017

I suspect that there is a connection between the number of messages being sent and Run Duration in our ExtractText processor (see screenshot) This is why: at 10,000 messages being sent to the Kafka topic / second for total of 1,000,000 we always see the odd displaced data in the filename without the minute on it no matter if the Run Duration is 500 ms, 1 s, or 2 s. (also we changed this from the lowest value because it was causing intermittent data loss) at 1,000 message / second for total of 100,000 if we set the Run Duration to 1 s, the files are perfect, the way we want them. Our ultimate use case is to send messages more than 10,000 / second (considerably) so maybe this will help shed some light.

elloyd · ‎05-04-2017

I suspect that there is a connection between the number of messages being sent and Run Duration in our ExtractText processor (see screenshot) This is why: at 10,000 messages being sent to the Kafka topic / second for total of 1,000,000 we always see the odd displaced data in the filename without the minute on it no matter if the Run Duration is 500 ms, 1 s, or 2 s. (also we changed this from the lowest value because it was causing intermittent data loss) at 1,000 message / second for total of 100,000 if we set the Run Duration to 1 s, the files are perfect, the way we want them. Our ultimate use case is to send messages more than 10,000 / second (considerably) so maybe this will help shed some light.

Online	Offline
Last Visited	‎03-14-2018 05:14 PM

Member Since	‎01-05-2017 02:25 PM
Last Visited	‎03-14-2018 05:14 PM
Posts	153
Kudos received	10

Cloudera Community

Re: TailFile cannot find directory/file which exis...

Re: Unusual data placement on file rollover in Nif...

Re: Cannot access input port in a process group?

Cannot access input port in a process group?

Re: Help understanding PutHDFS error in Nifi

Help understanding PutHDFS error in Nifi

Re: Retrieving from multiple Kafka topics through ...

Re: Retrieving from multiple Kafka topics through ...

Retrieving from multiple Kafka topics through Nifi...

Re: Unusual data placement on file rollover in Nif...

Re: Unusual data placement on file rollover in Nif...

Re: Unusual data placement on file rollover in Nif...