Member since
07-30-2019
3432
Posts
1632
Kudos Received
1012
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 105 | 01-27-2026 12:46 PM | |
| 513 | 01-13-2026 11:14 AM | |
| 1116 | 01-09-2026 06:58 AM | |
| 956 | 12-17-2025 05:55 AM | |
| 452 | 12-17-2025 05:34 AM |
06-06-2018
07:56 PM
@Mahmoud Shash Just wanted to follow-up to see how things are progressing. Still seeing issue? Did you try any of my suggestions above? I see no reason for having your invokeHTTP processor scheduled to execute on primary node only since it is being triggered by incoming FlowFiles. If you switch it to "all nodes", do you still see issue? What do you see when you perform a "List queue" action on the connection feeding your invokeHTTP processor? Within the "Cluster" UI found under the global menu, who is currently elected as the primary node? Does the data listed when you ran "List queue" belong to that same node? - Thank you, Matt
... View more
06-06-2018
07:36 PM
@John T Did the timeout changes help with the error you were seeing in your NiFi app log with regards to ZK connection loss? - Thanks, Matt
... View more
06-05-2018
06:14 PM
@John T Note: We do not recommend using the embedded ZK in a production environment. Aside from that connection issues can be expected during any NiFi shutdown/restart because the embedded ZK is shutdown also. Also the default ZK connection and session timeouts are very aggressive for anything more then a basic setup in ideal environment. - I recommend changing those to at least 30 secs each. - I also se that each of your embedded ZK servers are running in different ports (24489, 24427, and 24428), why? Unusual, but should not be an issue. Also confirm you created the unique "myid" files in the "./state/zookeeper" directory on each ZK server. - Of course any changes to any of NiFi's config files except logback.xml will require a restart for those changes to take affect. Once all nodes are back up and connected to cluster, check to see fi you are still seeing connection issues with ZK. - Thank you, Matt
... View more
06-05-2018
02:46 PM
@Henrik Olsen The FetchSFTP processor is deprecated in favor of the ListSFTP/FetchSFTP processors. The list/fetch model is works better in a NiFi cluster type configuration. Both the GetSFTP and ListSFTP processor should only ever be run on "primary node" only when used in a NiFi cluster. FetchSFTP should be configured to run on all nodes. - That being said, the GetSFTP will retrieve up to the configured "Max Selects" in a single connection. The ListSFTP will return the filenames of all files in a single connection. (The 0 byte Flowfiles generated from listSFTP should be routed to a Remote Process Group that will redistribute those 0 Byte FlowFiles to all nodes in the cluster where FetchSFTP will retrieve the actual content. - Regardless of how you retrieve the files, you are looking for a way to only process those files where you also retrieved the corresponding sha256 file. This can be accomplished using the Wait and Notify processors: In the above flow I have all the retrieved data (both <datafile> and <datafile>.sha256 files) coming in to a RouteOnAttribute processor. I route all the <datafile>.sha256 FlowFiles to a Notify processor. (in my test I had 20 <datafile> files and only 16 corresponding <datafile>.sha256 files). The Notify processor is configured to write the ${filename} to a DistributeMapCache service that every node in my cluster can access. My Wait processor is then designed to check that same DistributedMapCache service looking for "${filename}.sha256". If a match is found the Wait processor will release the <datafile> to the success relationship for further processing in your dataflow. The Wait processor is also configured to wait on so long looking for a match. So you see in my example that after the length of time the 4 Flowfiles that did not have a matching sha256 filename in the cache were routed to "expired" relationship. Set expiration high enough to allot for the time needed to retrieve both files. - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
... View more
06-05-2018
01:57 PM
@Artem Anokhin If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer. *** Forum tip: Pleasse try to avoid responding to an Answer by starting a new answer. Instead use the "add comment" tp respond to en existing answer. There is no guaranteed order to different answers which can make following a response thread difficult especially when multiple people are trying to assist you.
... View more
06-05-2018
01:52 PM
@Shu @Raja M I just want to correct one thing. There is no default prioritizer when none are selected. The "OldestFlowFileFirstPrioritizer" while it may appear in many cases as the default behavior you see it is purely coincidental. By default the order in which Flowfiles are processed from a queue is performance based. This means Flowfiles are processed in a order that best makes use of disk performance to minimize disk seeks. (So think of this as processed in order of written to disk.) In many cases this acts like oldestFlowFileFirst, but that can change if FlowFiles in a connection come from multiple sources flows. - Enforcing the order of FlowFile processing in NiFi can be challenging. Some processors work on batches of Files while other works on one FlowFile a ta time. FlowFiles routed down different paths are processed with no consideration of FlowFiles processed down a different path. Concurrent tasks on processors allow for concurrent execution of a processor (each task works on its own FlowFile.) with some FlowFile being processed faster then others making them complete out of order. Some processor may fail to complete a task for one reason or another in normal operations (FetchFile retrieving content and network issue causes connection to drop. FlowFile is penalized and routed to "failure" relationship. FetchFile moves on to next FlowFile and retries the failed FlowFile if Failure is looped back and once penalty expires. Now these Flowfiles are out of order). - NiFi was designed for speed at its core with the intent of each processor to work on FlowFIles it recieved with out needing tio care about other FlowFiles in any other queues. - There are a few processors introduced that may be used to help in your dataflow design to achieve this goal. Keep in mind that any enforcement of order is going to affect throughput of your NiFi because of the overhead introduced in doing so. You will want to take a look at the following processors: 1 EnforceOrder <-- This processor works well fo numerically order Flowfiles which timestamps are not going to provide. - 2 Wait and notify. <-- This allows you to enforce the processing of one FlowFile at a time in order. ----- Upon listing your Flowfiles, you would feed a wait processor. This processor could release one FlowFile in o the rest of your dataflow (FetchFile...etc...) and finally the notify processor, once processing of the FlowFile was successful. The notify would then trigger the Wait processor to release next FlowFile. (Set OldestFlowFileFirst prioritizer on connection between ListFile and Wait processors) - Thank you, Matt
... View more
06-04-2018
05:34 PM
1 Kudo
@Artem Anokhin No matter which host URL(s) you use in the configuration of an Remote Process Group (RPG). the RPG will ned up retrieving site-to-site (S2S) details that include all the currently connected nodes in the target cluster. - Included in those S2S details are things like: 1. Hostname of each node - defined by "nifi.remote.input.host= " configured on each node. 2. If "Raw" transport protocol is supported - defined by this property "nifi.remote.input.socket.port=" being set. 3. if "HTTP" transport protocol is supported - defined by this property "nifi.remote.input.http.enabled=" being set to true or false 4. If S2S connection is secure - defined by "nifi.remote.input.secure=" being set to true or false - There is no way to create "node groups" that would only be returned to a source NiFi during the retrieve S2S details phase of communications. - Thank you, Matt
... View more
06-01-2018
07:02 PM
@Jason Sphar Always best to start a new questions as this questions is unrelated to this question. There are multiple versions of the processors for Kafka because each is for a different Kafka client version. GetKafka/PutKafka <--- Kafka 0.8 ConsumeKafka/PublishKafka <--- Kafka 0.9 ConsumeKafka_0_10/PublishKafka_0_10/ConsumeKafkaRecord_0_10/PublishKafkaRecord_0_10 <--- Kafka 0.10 ConsumeKafka_0_11/PublishKafka_0_11/ConsumeKafkaRecord_0_11/PublishKafkaRecord_0_11 <--- Kafka 0.11 ConsumeKafka_1_0/PublishKafka_1_0/ConsumeKafkaRecord_1_0/PublishKafkaRecord_1_0 <--- Kafka 1.0 Thanks, Matt
... View more
05-31-2018
07:07 PM
If NiFi is sitting in a secured environment, why the need to secure NiFi? Could you just leave it http only? There is no work-around to enable http access in to a HTTPS enabled NiFi.
... View more
05-31-2018
06:35 PM
@Jason Sphar *** Forum Tip: Please try to avoid responding to an existing answer by starting a new "Answer". Instead click "Add comment" on the answer you want to add correspondence. There is no guaranteed order to answers in the forum so discussions can get hard to follow. - As far as your new error goes. There is already some other service in this server already bound to that port. You can not have more then one service bind to a specific port. - You could use the following command to see what process id is using that port: # netstat -nap |grep LISTEN - Then you can use following command to see which service is tied to that process id: ps -ef |grep <program/process id> - Thanks, Matt
... View more