Member since
07-30-2019
3387
Posts
1617
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 206 | 11-05-2025 11:01 AM | |
| 414 | 10-20-2025 06:29 AM | |
| 554 | 10-10-2025 08:03 AM | |
| 376 | 10-08-2025 10:52 AM | |
| 419 | 10-08-2025 10:36 AM |
04-10-2017
12:49 PM
@Michael Silas It is likely in an existing cluster that you have establish a number of user policies beyond the default "Initial Admin Identity". You do not want to delete the users.xml or authorizations.xml file at this time as you will lose all those new users and authorizations. Instead, add the new node as a /proxy user before actually adding the node to the cluster. You can copy the users.xml, authorizations.xml, and flow.xml.gz to you new node if you want at that time. - Agree - create a new cert for that node. If using the NiFi CA, you can simply click teh biox for regenerate certificates in Ambari (Available in HDF 2.x releases) - Agree that you need to be mindful of any custom nars as well as any referenced local files as these all need to copied to your new node as well. Matt
... View more
04-10-2017
12:42 PM
Only NiFi versions Apache NiFi 0.x or HDF 1.x have a NCM based cluster.
NiFi versions NiFi 1.x or HDF 2.x moved to a zero master clustering which no longer relies on a NCM.
... View more
04-10-2017
12:40 PM
@Dmitro Vasilenko Are you seeing an error or warn log message produced by the ConsumeKafka processor when you run it?
What is the processors configured run strategy/schedule?
... View more
04-07-2017
01:10 PM
@Paul Yang
The election of a Primary node and the Cluster Coordinator occurs through Zookeeper. Once a Cluster Coordinator is elected, all nodes will begin sending heartbeats directly to the elected primary node. If a heartbeat is not received in the configured threshold, that node will be disconnected. A single node disconnecting / reconnecting node may indicate a problem with just that single node.(Network latency between node and Cluster coordinator, garbage collection (stop the world event) that prevents node from heartbeating to cluster coordinator, etc... Check the NiFi app log on your nodes to make sure they are sending heartbeats regularly.) In your case you mention the Cluster Coordinator changes nodes frequently. This means that a new node is being elected as the Cluster Coordinator by zookeeper. This occurs when the current cluster coordinator has trouble communicating with zookeeper. Again, garbage collection can be the cause. Their is a known bug in HDF 2.0 / NiFi 1.0 (https://issues.apache.org/jira/browse/NIFI-2999) that can result in all nodes being disconnected when the cluster coordinator changes hosts. Since nodes send heartbeats directly to the current cluster coordinator, whomever is the current cluster coordinator keeps track of when the last heartbeat was received from a node. Lets assume a 3 node cluster (Node A, B, and C) Node A is current Cluster coordinator and is receiving heartbeats. At some point later Node B becomes the cluster coordinator and all nodes start sending heartbeats there. The bug which has been addressed occurs if at some point later Node A should become the cluster coordinator again. When that happens Node A looks at the last time it received heartbeats which it has since it was previously the cluster coordinator,but since they are all old, every node gets disconnected. They then auto-reconnect on next heartbeat. You can upgrade to get away from this bug (HDF 2.1 / NiFi 1.1), but ultimately you need to address the issue that is causing the cluster coordinator to change nodes. This is either a loading issue where there are insufficient resource to maintain a connection with zookeeper, an overloaded zookeeper, a zookeeper that does not have quorum, Node garbage collection issue resulting in to long of a lapse between zookeeper connections, etc... Thanks, Matt
... View more
04-07-2017
12:26 PM
@Sanaz Janbakhsh Unfortunately a formula for what percentage of your disk should be allocated to each repo does not exist and would frankly be impossible to establish considering so many dynamic inputs come in to play. But to establish a staring point from which to adjust from, I would suggest the following: 10% - 15% --> FlowFile Repository 5% - 10% --> Database repository 50% - 60% --> Content Repository ? --> Provenance Repository (Depends on your retention policies, but Provenance repo size can be set to a restricted size in Nifi configs. Default is 1 GB disk usage or 24 hours. Soft limits so it may temporarily exceed the size threshold until clean-up occurs, so don't set size to exact size of partition it is configured to use.) 10%- 15% --> /logs (This is very subjective as well. How much log history do you need to retain? What default log levels have you set? While the /logs directory may stay relatively small during good times, an outage can result in a logging explosion. Consider a downstream system outage. All NiFi processors that are trying to push data to that downstream system will be producing ERROR logs during that time.) The above assumes your OS and applications are installed on a different disk. If not you will need to adjust accordingly. Thanks, Matt
... View more
04-06-2017
02:40 PM
1 Kudo
@Raj B There is no specific policy specific to complete nifi-api access. Different nifi-api end-points will require that the user making the call to that end-point has the equivalent access policy. For example, in order for a user to view the "system diagnostics" via the NiFi UI, the user will need to have bee granted the global policy "view system diagnostics". curl 'https://<hostname>:<port>/nifi-api/system-diagnostics' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJuaWZpYWRtaW4iLCJpc3MiOiJMZGFwUHJvdmlkZXIiLCJhdWQiOiJMZGFwUHJvdmlkZXIiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJuaWZpYWRtaW4iLCJraWQiOjEsImV4cCI6MTQ5MTUyNzg0OSwiaWF0IjoxNDkxNDg0NjQ5fQ.1xou9lsBLBMaNuUUGJjebuYE1E8dzGWA7IPzb6_vEv0' --compressed --insecure The "Bearer" presented in the rest-api call will be checked against the access policies assigned to that user. Just remember that everything you do via NiFi's UI, are nothing more then calls to nifi-api. Thanks, Matt
... View more
04-06-2017
12:54 PM
2 Kudos
@Sanaz Janbakhsh In addition to the guide you mentioned, I strongly recommend you avoid using the embedded zookeeper for your NiFi cluster. HDF 2.x is highly dependent on ZK for things like cluster coordinator, primary node elections, and cluster state management. NiFi itself can put a considerable strain on your server's resources itself. So for cluster stability reasons you should configure your NIfi cluster to use an external ZK. The Largest repo should always be your content repo (Hold content of current files being processed as well as any archived data). FlowFile Repo is your most crucial repo (Corruption of this repo equals data loss). You can control how much disk space is used by Provenance. (How much you need depends on size of your dataflow, volume of data, and number of events you want to be able to retain). Database repo also stays relatively small (Flow configuration history and user DB exists there) As far as log directory goes, again this is highly dependent on your log retention policies and the amount of logging you have enabled in NiFi. Since sizing of most of the above is dependent on the complexity of your dataflow, data volumes, and data sizes, it would be impossible to say your system should have x size disks. The suggested approach is to setup a development/test environment where you can model your dataflow and volumes and then use that as input to your sizing requirements for your production environment. Thanks,
Matt
... View more
04-06-2017
12:30 PM
@Ahmad Debbas The GetHDFS processor is deprecated in favor of using ListHDFs and FetchHDFS processors. The GetHDFS processor does not retain state and therefore will start over from the beginning as you noted when an error occurs. The ListHDFS processor does maintain state, so even through NiFi restarts or processor restarts, the listing picks up where it left off. The zero byte FlowFiles produced are then passed to a FetchHDFS that actually retrieves the content and inserts it into the existing FlowFile. Another advantage to the list/fetch design model is the ability to distribute those listed zero byte files across a Nifi cluster before fetching the content. This improves performance by reducing resource strain caused by GetHDFS on a single NiFi node. Thanks, Matt
... View more
04-04-2017
01:33 PM
4 Kudos
@Pushkara Ravindra The intent of the Site-To-Site (S2S) protocol is to allow the exchange of NiFi FlowFiles between NiFi instances. A NiFi FlowFile consists of two parts: 1. FlowFile content <-- Original content in whatever format (NiFi is data agnostic and has no data format dependency) 2. FlowFile Attributes <-- Collection of key/value pairs (some are NiFi assigned by default while others are add via processors) Sending FlowFiles between NiFi instances allows the originating NiFi to share the attributes it knows about a FlowFiles content with the target NiFi instance. The FlowFile Attributes are loaded in to the FlowFile repo of the target NiFi automatically. In addition to the above, the S2S allows for the automatically smart load-balancing of FlowFiles to a target NiFi cluster. S2S allows for the auto-scaling up or down of the target Nifi cluster without the client needing to change anything. How it all works: The source/client NiFi instance/cluster will add a Remote Process Group (RPG) to their canvas and configure it to point at the URL of any target/destination NiFi instance or cluster node. The communication at this point is over HTTP protocol. Once a connection is established the destination NiFi sends S2S details back to the source NIFi (Includes URLs of nodes if destination is cluster and the current load of each node.) The RPG will continuously update this information and store a local copy of this information in the event it cannot get an update at any time. Input and output ports are used to send or receive FlowFiles from the parent process group of where they were added. So when input or output ports are added to the root canvas level of any dataflow they become "remote" input and output ports capable of sending or receiving data from another NiFi. Whether you set the S2S protocol to HTTP or RAW the above is true. What is different is what happens next (Actual FlowFile transfer). When using the RAW format (Socket based transfer), the "nifi.remote.input.host" and "nifi.remote.input.socket.port" configured values from each of the target NiFi instances are used by the NiFi client as the destination for sending FlowFiles. When using the HTTP format, the "nifi.remote.input.host" and the "nifi.web.http.port" or "nifi.web.https.port" configured values from each of the target NiFi instances are used by the NiFi client as the destination for sending FlowFiles. Advantage of RAW format is that their is a dedicated port for all S2S transfers, so under high load it affect on the NiFi HTTP interface is minimal. Advantage of HTTP, you do not need to open an additional S2S port since teh same HTTP/HTTPS port is used to transfer FlowFile. Thanks, Matt
... View more
03-30-2017
12:07 PM
1 Kudo
@Praveen Singh You could install sshpass which would allow you to use a password in the ssh connection, but i strongly recommend against this approach. This requires you to set your password in plaintext in your NiFi processor which exposes it to anyone who has access to view that component. Thanks, Matt
... View more