Member since
01-05-2017
153
Posts
10
Kudos Received
2
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4484 | 02-20-2018 07:40 PM | |
| 3306 | 05-04-2017 06:46 PM |
06-09-2017
03:29 PM
Hello this is a simple question. Does anyone know why we cannot access an input port in a process group? I have a Remote Process Group trying to access an input port on another server that I have located in a Process Group but it doesn't appear when I try to connect to it with the connection to the Remote Process Group. It'd be nice, since we will have many servers connecting to many Input Ports, to have them divided up into different process groups instead of all sprawled out on the main Nifi root interface.
... View more
Labels:
- Labels:
-
Apache NiFi
06-07-2017
04:03 PM
Thanks for the help. Your estimate on the Run Schedule was a bit high though. When I changed it to even 30 seconds, it bottlenecked badly right before MergeContent. You were right though - when I lowered it to 1 sec, I have very little bottleneck and the error is gone.
... View more
06-02-2017
07:07 PM
Can someone help me understand an error in PutHDFS? I currently have it set up to read from a Kafka topic and do transformations on it (including SplitText which seems to cause the problem because if I run this flow without SplitText it doesnt have this PutHDFS error - the purpose of SplitText is to prevent bleeding of events from one file into another incorrect file (I have them separated by minute))
I have included screenshots of the flow, config of SplitText and PutHDFS and the error along with below is the actual log from nifi-app.log 2017-06-02 18:58:03,870 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Successfully performed Expiration Action org.apache.nifi.provenance.expiration.FileRemovalAction@1ab1bf76 on Provenance Event file ./provenance_repository/68922860.prov.gz in 4 millis
2017-06-02 18:58:04,375 ERROR [Timer-Driven Process Thread-140] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950] Failed to write to HDFS due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /topics/minifitest/cis-prod/logger.prod.aqx.cequintecid.com/home.elloyd.log-gen/2017/06/02/18/2017_06_02_18_57.log for DFSClient_NONMAPREDUCE_2044289469_247 on 10.10.2.116 because DFSClient_NONMAPREDUCE_2044289469_247 is already the current lease holder.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2970)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
: org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=3a2c35f9-06c2-1502-44e8-7de09980c950]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /topics/minifitest/cis-prod/logger.prod.aqx.cequintecid.com/home.elloyd.log-gen/2017/06/02/18/2017_06_02_18_57.log for DFSClient_NONMAPREDUCE_2044289469_247 on 10.10.2.116 because DFSClient_NONMAPREDUCE_2044289469_247 is already the current lease holder.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2970)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2766)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3073)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3042)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:760)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230 7)
... View more
Labels:
- Labels:
-
Apache NiFi
05-10-2017
02:08 PM
Thanks Bryan that article was very helpful in understanding. We resolved this error by increasing our Maximum Timer Driven Thread Count and Maximum Event Drive Thread Count in the general settings of the Nifi instance which is consuming. We are currently only testing so each topic only has 1 partition and for each topic we are using a ConsumeKafka which has 5 Concurrent Tasks so according to your article, we should have a surplus of tasks. The error from this issue has been resolved but now we experience data loss when retrieving from more than 2 hosts. I guess that is the topic for another question though.
... View more
05-09-2017
05:32 PM
Its important to note here that when we only collected from two hosts, it worked just fine.
... View more
05-09-2017
03:50 PM
Hello, here is our setup. We have 4 servers that are sending data through PublishKafka_0_10 to a different topic each. We have 1 receiving server that uses ConsumeKafka_0_10 in FOUR flows, one for each sending server for each topic. (see screenshots) We are trying to parse out the events by changing the filename to 2017_05_09_15_topic.log We are getting the error shown in the screenshot related to commit being failed in ConsumeKafka since the group has already rebalanced and it suggests increasing session timeout or reducing maximum size of batches returned with max.poll.records. We are under the impression that: session timeout = Kafka configurable property: offsets.commit.timeout.ms which we changed from 5000 -> 25000 and max.poll.records is the ConsumeKafka property: Max Poll Records which we have changed from 10,000 -> 1,000 We have also tried tuning the Max Uncommitted Time both higher than 3 secs and lower than 3 secs. What we are seeing is we are sometimes missing data, sometime getting data duplicates and the filenames are odd... it will have the correct format for them but then it will be missing data that will be put into a file in the format: 2017_05_09__topic.log (which is missing the minutes) I realize theres probably alot of issues here ... the inquiry of this question is more based around understanding the error we received, finding out if our modifications to the parameters it mentions are the correct parameters we are modifying (which I suspect they aren't) and possible solutions. Thank you.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
05-04-2017
07:07 PM
Changing the Concurrent Tasks in ExtractText to 3 and reducing the Run Duration to 500ms fixed the problem.
... View more
05-04-2017
06:46 PM
I suspect that there is a connection between the number of messages being sent and Run Duration in our ExtractText processor (see screenshot) This is why: at 10,000 messages being sent to the Kafka topic / second for total of 1,000,000 we always see the odd displaced data in the filename without the minute on it no matter if the Run Duration is 500 ms, 1 s, or 2 s. (also we changed this from the lowest value because it was causing intermittent data loss) at 1,000 message / second for total of 100,000 if we set the Run Duration to 1 s, the files are perfect, the way we want them. Our ultimate use case is to send messages more than 10,000 / second (considerably) so maybe this will help shed some light.
... View more
05-04-2017
06:46 PM
I suspect that there is a connection between the number of messages being sent and Run Duration in our ExtractText processor (see screenshot) This is why: at 10,000 messages being sent to the Kafka topic / second for total of 1,000,000 we always see the odd displaced data in the filename without the minute on it no matter if the Run Duration is 500 ms, 1 s, or 2 s. (also we changed this from the lowest value because it was causing intermittent data loss) at 1,000 message / second for total of 100,000 if we set the Run Duration to 1 s, the files are perfect, the way we want them. Our ultimate use case is to send messages more than 10,000 / second (considerably) so maybe this will help shed some light.
... View more