About MattWho

MattWho · ‎03-03-2017

@spdvnz NiFi Processors: NiFi processor components that are likely to encounter failures will have a "failure" routing relationship. Often times failure is handled by looping that failure relationship back on the same processor so that the operation against the failed FlowFile will be re-attempted after a FlowFile penalty duration has expired (default 30 secs). However, you may also which to route failure through additional processors. For example maybe you failure is in a PutSFTP processor configured to send data to system "abc". Instead of looping the failure relationship, you could route failure to a second PutSFTP processor configured to send to a alternate destination server "xyz". Failure from the second PutSFTp could be routed back to the first PutSFTP. In this scenario, a complete failure only occurs delivery to both systems fails. NiFI Process Groups: I am not sure what failover condition at a process group level you are trying to account for here. Process groups are nothing more then a logical container of individual dataflows. failover would still be handled through dataflow design. NiFi Node level failover: In a NiFi cluster, there is always a "cluster coordinator" and a "primary node" elected. The "primary node" will run all processors that are configured with "on primary node" only on their scheduling tab. Should the cluster coordinator stop receiving heartbeats from the current primary node, a new node will be designated as the primary node and will start the "on primary node" processors. If the "cluster coordinator" is lost, a new cluster coordinator will be elected and will assume the role of receiving heartbeats from other nodes. A node that has become disconnected from the cluster will continue to run its dataflows as long as NiFi is still running. NiFI FlowFile failover between nodes. Each node in a NiFi cluster is responsible for all the FlowFiles it is currently working on. Each node has no knowledge of what FlowFiles are currently queued on any other node in the cluster. If a NiFi node is completely down, the FlowFiles that it had queued at the time of failure will remain in its repos until the NiFi is brought back online. The content and FlowFile repositories are not locked to a specific NiFi instance. While you cannot merge these repositories with existing repos of another node, It is possible to standup an entirely new NiFi node and have it use these repositories from the down node to pick up operation where it left off. So it is important to protect the FlowFile and Content repositories via RAID so that disk failure does not result in data loss. Data HA across NiFi nodes is a future roadmap item. Thanks, Matt

MattWho · ‎03-02-2017

@spdvnz Does the following link work for you: wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo This should set you up with the latest supported Ambari release for HDF. After having the correct Ambari server version installed, proceed with the installation: - # yum install ambari-server - # ambari-server setup - # ambari-server install-mpack --mpack=http://public-repo-1.hortonworks.com/HDF/centos7/2.x/updates/2.1.2.0/tars/hdf_ambari_mp/hdf-ambari-mpack-2.1.2.0-10.tar.gz --purge --verbose - # ambari-server start Thanks, Matt

MattWho · ‎03-02-2017

@nedox nedox You will want to use one of the available HDFS processors to get data form your HDP HDFS file system. 1. GetHDFS <-- Use if standalone NiFi installation 2. ListHDFS --> RPG --> FetchHDFS <-- Use if NiFI cluster installation All of the HDFS based NiFi processors have a property that allows you to specify a path to the HDFS site.xml files. Obtain a copy of your core-site.xml and hdfs-site.xml files from your HDP cluster and place them somewhere on the HDF hosts running NiFi. Point to these files using the "Hadoop Configuration Resources" processor property. example: Thanks, Matt

MattWho · ‎03-01-2017

@Raj B There are no existing NiFi reporting tasks that are part of any current NiFi release for sending any information to Atlas. So if you are looking for something that has been tested and accepted by the Apache community, it does not exist yet. Thanks, Matt

MattWho · ‎03-01-2017

@Martin van Husen I can only speak to any NiF issues here, but what do you mean by "my system doesn't work correct anymore"? If you stop NiFi, does your system go back to normal? Perhaps you do not have enough resources to run all these services on your machine. Matt

MattWho · ‎02-24-2017

@Mourad Chahri Hbase runs on top of HDFS in HDP. The only service that is part of HDF that is not in HDP is NiFi. NiFi can send and retrieve data with and HDP HDFS with Hbase without both services needing to be installed on the same nodes/hosts.

MattWho · ‎02-24-2017

@Mourad Chahri "how to install HDF on the same cluster , because i wanna use HDF and HDP" HDF does not need to be installed on the same hardware as HDP in order to have the software packages send data to one another. For example, HDF NiFi includes the hadoop client libraries needed to send/get data from HDP HDFS. All you need to provide NiFi is the core-sites.xml and HDFS-sites.xml files. No need to install Hadoop (HDFS) clients on the NiFi nodes/hosts or have HDP HDFS installed on the same nodes/hosts. Thanks, Matt

MattWho · ‎02-24-2017

@Mourad Chahri Different Ambari servers can not own the same hosts/nodes. Ambari agents which are installed on each node are configured to communicate with a single Ambari Server.

MattWho · ‎02-24-2017

@Pradhuman Gupta Backpressure has kicked in on your dataflow. Every new connection by default has a default backpressure object threshold of 10,000 FlowFiles. When Backpressure is reached on a connection, the connection is highlighted in red and the backpressure bar (left = object threshold and right = Size threshold) will show which threshold has reached 100%. Once backpressure is applied, the component (processor) directly upstream of that connection will no longer run. As you can see in your screenshot above the "success" from your PutSplunk processor is applying backpressure. As a result the PutSplunk processor is no longer getting scheduled to run by the NiFi controller. Since it is no longer executing, FlowFiles began to queue on the connection between your TailFile and PutSplunk processor. Once backpressure kicked in here as well, the TailFile processor was stopped as well. If you clear the backpressure on the "success" connection between your PutSplunk and PutEmail processor, your dataflow will start running again. You can adjust the backpressure threshold by right clicking on a connection and selecting "configure". (The configure option is on available if the processors on both sides of a connection are stopped) In addition to adjusting backpressure settings, you also have the option of setting "file expiration" on a connection. File expiration dictates how old a FlowFile in a given connection can be. If the FlowFile has existed in your NiFi (not how long it has been in that specific connection) for longer then the configured time, it is purged from your dataflow. This setting if set aggressive enough could help keep your "success" relationship clean enough to avoid back pressure. Thanks, Matt

MattWho · ‎02-23-2017

@Oliver Meyn You are correct that the Site-To-Site connection and authorizations is handled at the server level and not at the user level. There is no configuration change you can make that would change this behavior. The authorization level is allowing server A to communicate and send data to serverB. Users play no role in the S2S data transfer process. I am not sure how this enhancement would work. Setting the authorization level of S2S down to the user level would require adding these users to serverB which may not be desirable. Also what if ServerA has a process group with the RPG that is authorized by many users? Would the expectation be that every on of those users then needs to be added/authorized to serverB? I suggest opening an apache Jira against NiFi to raise additional discussion around this topic. Thanks, Matt

Online	Offline
Last Visited	‎12-04-2025 02:51 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-04-2025 02:51 PM
Posts	3,398
Kudos received	1617

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: Cannot access the NiFi Registry from NiFi and ...

Re: Error connecting to NiFi Registry from NiFi UI...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Failover mechanism in nifi

Re: Cannot find install-mpack : Issue while Instal...

Re: Get Data from HDP using HDF

Re: How to integrate NiFi with Atlas, for metadata...

Re: System didnt work after installing NiFi. Cant...

Re: Install apache NIFI with ambari on existing HD...

Re: Install apache NIFI with ambari on existing HD...

Re: Install apache NIFI with ambari on existing HD...

Re: Why PutSplunk stopped picking the data from Qu...

Re: Preserve identity in multi-tenant NiFi over si...