About MattWho

MattWho · ‎05-11-2017

@Gaurav Jain This is the exact use case for why GetSFTP was deprecated in favor of listSFTP and FetchSFTP processors. The ListSFTP processor would run on the primary node only. It produces one 0 byte FlowFile for every file in the listing. All these 0 byte FlowFiles are then sent to a RPG for distribution across cluster. The distributed files are then fed to a FetchSFTP processor that will retrieve the content form the SFTP server and insert it in to the FlowFile at that time. This model eliminates the overhead on the Primary node since it does not need to write the content and it reduces network overhead between nodes since their is no content being send in FlowFiles via the RPG. The only issue you are going to run in to is: https://issues.apache.org/jira/browse/NIFI-1202 This issue is addressed in Apache NiFi 1.2.0 which was just released this week. It will also be addressed in HDF 3.0 which will be released soon. You can work around the issue in older versions by setting a small object backpressure threshold on the connection feeding your RPG. Since this backpressure is a soft limit, you need to put a processor between your listSFTP processor and the RPG that only processes FlowFiles one at a time. I recommend RouteOnAttribute (no configuration needed on processor, simply route the one existing "unmatched" relationship to the RPG and set back pressure on that connection). Thanks, Matt

MattWho · ‎05-10-2017

Based on what you provided above it looks like you installed via the nifi tar.gz file. By default NiFi runs unsecure on port 8080. So yes, you want to change that port to some unused port on your server. Thanks, Matt

MattWho · ‎05-10-2017

@Anthony Murphy NiFi stores state differently depending on whether your NiFi is installed as cluster or a standalone instance. With a cluster, NiFi stores state in zookeeper. As long as your new Nifi points to the same zookeeper or the zookeeper content has been moved to a new zookeeper you are using, state will be preserved. In a Standalone Nifi install, state is recorded on disk. You can look in your NiFi's state-management.xml file to see/change the configuration of both the "local-provider" (used by standalone NiFi) or "cluster-provider" (used by Clustered NiFi). You van change where state is being written to here. Thanks, Matt

MattWho · ‎05-10-2017

@umair ahmed I am not a Windows Admin and could not tell you off teh top of my head.

MattWho · ‎05-10-2017

@Gaurav Jain You can build into your dataflow the ability to redistribute FlowFiles between your nodes. Below are just some of the benefits NiFi clustering provides: 1. Redundancy - You don't have a single point of failure. You dataflows will still run even if a node is down or temporarily disconnected form your cluster. 2. Scaleable - You can scale out the size of your cluster to add additional nodes at any time. 3. Ease of Management - Often times a dataflow or multipole dataflows are constructed within the NiFi canvas. The volume of data may eventually push the limits of your hardware necessitating the need for additional hardware to support the processing load. You could stand up another standalone Nifi instance running the same dataflow, but then you have two separate dataflows/canvases you need to manage. Clustering allows you to make a change in only one UI and have those changes synced across multiple servers. 4. Provides Site-To-Site for load-balanced data delivery between NiFi end-points. As you design your dataflows, you must take in to consideration how the data will be ingested. - Are you running a listener of some sort on every node? In that case source systems you push data to your cluster through some external load-balancer. - Are you pulling data in to your cluster? Are you using a cluster friendly source like JMS or Kafka wheer multiple NiFi nodes can pull data at the same time? Are you using non-cluster friendly protocols to pull data like SFTP or FTP? (In case like this load-balancing should be handled through list<protocol> --> RPG Input port --> Fetch<protocol> model) NiFi has data HA on its future roadmap which will allow other nodes to pickup work on data of a down node. Even when this is complete, I do not believe it will doing any behinds the scenes data redistribution. Thanks, Matt

MattWho · ‎05-10-2017

@Muhammad Umar The log is telling you that the port NiFi is trying to use for its HTTP or HTTPS is already in use on the server where you have installed NiFi. HDF installed via Ambari by default uses port 9090 for HTTP and 9091 for HTTPS. You will need to change the NiFi configuration to use an available port on your server. Thanks, Matt

MattWho · ‎05-09-2017

I literally hit the "tab" key on my keyboard.

MattWho · ‎05-09-2017

@Prabir Guha You can use the replaceText processor to replace tabs with commas in a text/plain input file. lets assume my input file's content has the following value: I could then configure my replaceText processor to do teh following: The Search Value is set to a tab. The Replacement Value is set to a comma. The resulting content is: Thanks, Matt

MattWho · ‎05-09-2017

@Sunil Neurgaonkar There are Global access policies and Component level access policies. The component level access policies are set against components (processors, input ports, output ports, Remote Process groups, etc...). There are no access policies for the icons in the tool bar used to create dataflows. Component level access policies can be assigned to process groups and sub process groups, or they can be assigned to specific components (processors, labels, input ports, output ports, Remote Process groups, etc...) If I am understanding you correctly, you want to control which dataflow building tools specific users have access to. correct? If so, that level of control does not exist. The assumption is that the admin user assigns different users the ability to view/modify only those users assigned process groups. Once they have modify on a process group, they will be able to use all the icons in the dataflow building toll bar to construct their dataflow. The only acception to that are components marked as restricted (this includes some processors and controller services) which would require the user to have been granted the global access policy to "access restricted components". The implementation of such granular control would be challenging to implement without significant changes in NiFi. Take the following template example: - Templates can contain process groups, sub-process groups, and controller services. What would the expect behavior be if a user tried to instantiate such a template on to the canvas? Fail all together because it contains components user (TEST1) is not authorized to create? Once a dataflow is created you can set component level access policies very granularly against specific components rather then against the process group they reside in. While this granular access control would limit a user to being able to view/modify the specific component, the user would not be able to add new components to the process group. Thanks, Matt

MattWho · ‎05-09-2017

What policies did you authorize the new user for? A user will not be able to load the canvas if they don't at least have the "view the user interface" global access policies assigned to them. Thanks, Matt

Online	Offline
Last Visited	‎10-23-2025 07:03 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎10-23-2025 07:03 AM
Posts	3,391
Kudos received	1614

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: Nifi Cluster with Remote Process Group

Re: Unable to open NIFI web UI

Re: Nifi Provenance, Flowfiles and Processor State...

Re: NiFi: Flowfile retries

Re: Nifi Cluster setup for NIFI 1.1.2

Re: Unable to open NIFI web UI

Re: How best to replace all TABs (\t) by COMMAs ...

Re: How best to replace all TABs (\t) by COMMAs ...

Re: Nifi not authorising the user

Re: Nifi not authorising the user