About MattWho

MattWho · ‎03-03-2017

@Sunile Manjee As part of the S2S protocol, the RPG will receive continuously get updated list of available peers (which includes load on peers in terms of total queued flowfiles) which it will store locally. It will then load-balanced based on that information. If for some reason it fails to get an update at any time it will continue to try and deliver and load-balance using the last list of known peers and loads. The RPG can get an updated list of available peers from any one of the target NiFi cluster's nodes. It only must get that list from the configured URL on the RPG the first time it connects. After that is has a list of peers it can get that info from. Thanks, Matt

MattWho · ‎03-03-2017

I don't know where HDFS files are placed in the sandbox, but I know you cannot copy the file from HDFS to NiFi from within Ambari. You will need to do this via a command/terminal window. you could use the "locate" command if you are running on linux and have the "mlocate" package installed. # yum -y install mlocate # updatedb # locate core-ste.xml

MattWho · ‎03-03-2017

@Martin van Husen The core-site.xml file is copied from your HDFS node to your NiFi node. You modify the local copy on NiFi as described above and point to this file using the the "Hadoop Configuration Resources" property in the NiFi HDFS processor.

MattWho · ‎03-03-2017

@Pradhuman Gupta Processors that support dynamic properties will include that information in their associated documentation. You can access processor documentation specific the NiFi release you are running via help found in the menu located in the upper right corner of the NiFi UI. You can also find the latest Apache NiFi version processor documentation here: https://nifi.apache.org/docs.html As an example of a processor that supports dynamic properties, look at GenerateFlowFile or UpdateAttribute. Thanks, Matt

MattWho · ‎03-03-2017

@spdvnz When NiFi dataflows are designed so that they are pulling data from source systems, load-balancing and auto-scaling can be handled fairly easily through the use primary node and remote process groups. For example: A ListSFTP processor can be configured to run on the primary node only. It would be responsible for producing a 0 byte FlowFile for every File returned in the listing. Those 0 byte FlowFiles are then routed to a Remote Process Group (RPG) which is configured to point back at the same NiFi cluster. The RPG would handle smart load-balancing of data across all currently connected nodes in the cluster (If additional nodes are added or nodes are removed, the RPG auto-scales accordingly). On the receiving side of the RPG (remote input port) the 0 byte FlowFile would be routed to a FetchSFTP processor that will actually retrieve the content for the FlowFile. There are many list and fetch processors. They generally exist where the process is not exactly cluster friendly. (Not a good idea having GetSFTP running on every node as they would all complete for the same data.) If your only options is to have your source system push data to your NiFi cluster, those source systems would need to know the currently available nodes at all times. One option is use MiNiFi on each of your source systems that can make use of NiFi's Site-To-Site (Remote Process Groups) protocol to send data in a smart load-balanced way to all the NiFi nodes currently connected in the target cluster. Another option: https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html Thanks, Matt

MattWho · ‎03-03-2017

@spdvnz NiFi Processors: NiFi processor components that are likely to encounter failures will have a "failure" routing relationship. Often times failure is handled by looping that failure relationship back on the same processor so that the operation against the failed FlowFile will be re-attempted after a FlowFile penalty duration has expired (default 30 secs). However, you may also which to route failure through additional processors. For example maybe you failure is in a PutSFTP processor configured to send data to system "abc". Instead of looping the failure relationship, you could route failure to a second PutSFTP processor configured to send to a alternate destination server "xyz". Failure from the second PutSFTp could be routed back to the first PutSFTP. In this scenario, a complete failure only occurs delivery to both systems fails. NiFI Process Groups: I am not sure what failover condition at a process group level you are trying to account for here. Process groups are nothing more then a logical container of individual dataflows. failover would still be handled through dataflow design. NiFi Node level failover: In a NiFi cluster, there is always a "cluster coordinator" and a "primary node" elected. The "primary node" will run all processors that are configured with "on primary node" only on their scheduling tab. Should the cluster coordinator stop receiving heartbeats from the current primary node, a new node will be designated as the primary node and will start the "on primary node" processors. If the "cluster coordinator" is lost, a new cluster coordinator will be elected and will assume the role of receiving heartbeats from other nodes. A node that has become disconnected from the cluster will continue to run its dataflows as long as NiFi is still running. NiFI FlowFile failover between nodes. Each node in a NiFi cluster is responsible for all the FlowFiles it is currently working on. Each node has no knowledge of what FlowFiles are currently queued on any other node in the cluster. If a NiFi node is completely down, the FlowFiles that it had queued at the time of failure will remain in its repos until the NiFi is brought back online. The content and FlowFile repositories are not locked to a specific NiFi instance. While you cannot merge these repositories with existing repos of another node, It is possible to standup an entirely new NiFi node and have it use these repositories from the down node to pick up operation where it left off. So it is important to protect the FlowFile and Content repositories via RAID so that disk failure does not result in data loss. Data HA across NiFi nodes is a future roadmap item. Thanks, Matt

MattWho · ‎03-02-2017

@spdvnz Does the following link work for you: wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo This should set you up with the latest supported Ambari release for HDF. After having the correct Ambari server version installed, proceed with the installation: - # yum install ambari-server - # ambari-server setup - # ambari-server install-mpack --mpack=http://public-repo-1.hortonworks.com/HDF/centos7/2.x/updates/2.1.2.0/tars/hdf_ambari_mp/hdf-ambari-mpack-2.1.2.0-10.tar.gz --purge --verbose - # ambari-server start Thanks, Matt

MattWho · ‎03-02-2017

@nedox nedox You will want to use one of the available HDFS processors to get data form your HDP HDFS file system. 1. GetHDFS <-- Use if standalone NiFi installation 2. ListHDFS --> RPG --> FetchHDFS <-- Use if NiFI cluster installation All of the HDFS based NiFi processors have a property that allows you to specify a path to the HDFS site.xml files. Obtain a copy of your core-site.xml and hdfs-site.xml files from your HDP cluster and place them somewhere on the HDF hosts running NiFi. Point to these files using the "Hadoop Configuration Resources" processor property. example: Thanks, Matt

MattWho · ‎03-01-2017

@Raj B There are no existing NiFi reporting tasks that are part of any current NiFi release for sending any information to Atlas. So if you are looking for something that has been tested and accepted by the Apache community, it does not exist yet. Thanks, Matt

MattWho · ‎03-01-2017

@Martin van Husen I can only speak to any NiF issues here, but what do you mean by "my system doesn't work correct anymore"? If you stop NiFi, does your system go back to normal? Perhaps you do not have enough resources to run all these services on your machine. Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,133
Kudos received	1560

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: Load distribution of nifi flow files adding ne...

Re: Failed to write to parent HDFS directory.

Re: Failed to write to parent HDFS directory.

Re: Can we add custom attribute in Apache NiFi on ...

Re: load balancing in nifi

Re: Failover mechanism in nifi

Re: Cannot find install-mpack : Issue while Instal...

Re: Get Data from HDP using HDF

Re: How to integrate NiFi with Atlas, for metadata...

Re: System didnt work after installing NiFi. Cant...