Member since
07-30-2019
3406
Posts
1621
Kudos Received
1006
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 27 | 12-17-2025 05:55 AM | |
| 88 | 12-15-2025 01:29 PM | |
| 43 | 12-15-2025 06:50 AM | |
| 199 | 12-05-2025 08:25 AM | |
| 339 | 12-03-2025 10:21 AM |
03-20-2018
07:47 PM
2 Kudos
@Dan A NiFi cluster can scale up pretty easily, but scaling down involves a lot more steps. A new instance can be added to an existing cluster by simply configuring the new NIFI instance with the same NiFi conf directory files. If the cluster is secured, you would need to provide a new keystore and copy the existing truststore from another node. On startup, the new NiFi will connect with the cluster and pull-down the cluster's flow.xml.gz, user.xml, and authorizations.xml files. It will then build all its local repositories based on defined paths from the nifi.properties file. Any Remote Process Groups pointing at this cluster will learn about this new node on ext update (default 30 seconds). Scaling down is a more complicated process. Since a node now has active data, that node would need to be disconnected form cluster. Then its ingest processors stopped. Once all data has finished processing through all its dataflows, it could be shutdown completely. The node could then be dropped completely from cluster which is another manual step. Thank you, Matt
... View more
03-20-2018
11:59 AM
@Ramkrishna Utpat My recommendation at this time would be to raise an Apache Jira against NiFi. The error is being returned by the hive client library to NiFi. NiFi is taking that client response which is the Hive server response to the client library and making a routing decision. Not sure why NiFi feels this specific error is a comms.failure instead of a not.found condition. May be something in response, may be something in NiFi code itself is missing. There is nothing else we can configure from the NiFi processor side here. Thank you, Matt
... View more
03-20-2018
11:52 AM
@Jayendra Patil Just to add to the excellent answer above. The use of the "Event Driven" scheduling strategy by any NiFi processor component is not recommended. The Event Driven strategy is considered experimental. So there is no need to configure a thread resource pool under "Max Event Driven Thread Count". I recommend setting this value back to default 5. Reducing the size of the event driven thread pool will require a NiFi restart. (Event driven Increase can be performed without restart) - Max Timer Driven Thread count can be increased and decreased without a NiFi restart. - Does your NIFi canvas show your dataflow using all the threads provided. Best to access "Cluster" UI and observe how the thread pool on each node is being utilized. Do you observe any node where the thread usage is close to your "Max Timer Driven Thread Count"? Are you seeing bottlenecks in your dataflow (Queued up data)? Thanks, Matt
... View more
03-19-2018
08:29 PM
@Mark Lin
Separate NiFi instances (even those that are part of same cluster) CANNOT share repositories. Each NiFi instance must have their own unique set of repositories since each instance will be working on its own unique set of FlowFiles. - Pointing NiFi repositories to mounted folders is an option, but for best performance local disks will perform better. For a high performance system, having multiple separate RAID disks (Raid 1 for data integrity) for Content, FlowFile, and Provenance repos is recommended. - Using mounted folders will affect performance, but offers and easier method of recovery if a node is lost forever. The Repositories are not tied in any way to specific NiFi instance/host. You can standup a new instance of NiFi and as long as you provide it with the FlowFile repo, Content repo, and cluster flow.xml.gz, it will be able to start up and continue processing from same point where the old dead node left off. - The specific naming of your mounts is what ever makes logical sense to you. As long as none of the cluster nodes are trying to write to the same mount, you will be good to go. Keep in mind that there can be considerable I/O with these repositories (depending on FlowFile volume and number of processors), so if all these mounted folders are from same mounted disk, you are likely to have performance issues as well. Separate disks is always the recommended path. - Thank you, Matt
... View more
03-19-2018
02:36 PM
@Ramkrishna Utpat Tip: try to avoid staring a new "Answer" when responding to an existing "Answer" thread in HCC. In a NiFi cluster, the ListFTP processor should only be running on "Primary node" only. FTP is not a cluster friendly protocol. You get yourself in to a race condition by having all nodes running the list based processor components. To distribute work load across your entire cluster, you should be feeding the listed files to a Remote Process Group (RPG). The RPG should be pointing at same cluster. The list FlowFiles will be load-balanced to all nodes in your cluster by sending to a remote "input port". That input port should feed your FetchFTP processor. That way each node is ingesting unique data from the FPT server and you don't have issues where multiple nodes are trying to retrieve same data. Helpful links on RPGs: https://community.hortonworks.com/articles/16461/nifi-understanding-how-to-use-process-groups-and-r.html https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html Thank you, Matt
... View more
03-15-2018
01:50 PM
@Ramkrishna Utpat Interesting that it is being logged as a comms.failure. How is the data being written to the FTP server? Is there maybe a lock file preventing the fetchFTP user from being able to access/delete this file? Do you confirm the file is really missing from the FTP server? Are there multiple systems/nodes trying to pull this data? If the file is still exists, is it successful if you try to fetch files routed to comms.failure again?
... View more
03-15-2018
12:35 PM
@Ramkrishna Utpat For writing a custom log message to the nifi-app.log, you can use the "LogMessage" processor.
... View more
03-15-2018
12:21 PM
@Ramkrishna Utpat If I am following your flow description correctly, I sounds like you just need to route the "not.found" relationship from the FetchFTP processor to PutEmail processor. Thank you, Matt
... View more
03-14-2018
04:56 PM
@Eric Lloyd At 0 secs the processor is trying to run as fast as possible, so basically no break in processing. Just setting it to 2 or 3 seconds may help.
... View more
03-14-2018
12:19 PM
1 Kudo
@Chad Woodhead
Whether you use HTTP or RAW, the URL used when creating the RPG will always point at the same URL for the NIFi target instance or instances. http(s)://<hostname>:<nifi.web.http(s).port>/nifi (for example: https://hostname:9091/nifi)
The RPG will always connect to the above target URL to retrieve S2S details. (such as: will connection use RAW or HTTP, how many nodes in target cluster, target node current load, nifi.remote.input.host and nifi.remote.input.socket.port for each target cluster node, etc...)
When it comes to the actual transfer of FlowFiles over the S2S protocol, Those FlowFiles will either be transferred over the same nifi.web.http(s).port or via a dedicated RAW nifi.remote.input.socket.port. Thank you, Matt
... View more