About MattWho

MattWho · ‎06-09-2023

@naveenb Your query will get better visibility by starting a new question in the community rather then asking on an already solved question. NiFi's ListSFTP and GetSFTP (deprecated in favor of listSFTP and FetchSFTP) processor only lists/gets files. When it generates a NiFi FlowFile from a file it finds recursively within the source SFTP server configured base directory, it adds a "path" attribute to that FlowFile. That "path" attribute has the absolute path to the file. So based on your configuration, the results you are seeing are expected since you configured your putSFTP with "/home/ubuntu/samplenifi/${path}" Were "path" attribute on your FlowFiles resolves to "/home/nifiuser/nifitest/sample" for files found in that source subdirectory. You can use NiFi expression language (NEL) to modify that "path" attribute string to get rid of the "/home/nifiuser" portion /home/ubuntu/samplenifi/${path:substringAfter('/home/nifiuser')} If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

SandyClouds · ‎06-08-2023

@MattWho Thanks much for clearing my confusions around on - how my jobs running automatically when i restart server.. I encountered restart couple of times but never know this feature. You are awesome!!

MattWho · ‎06-07-2023

@SandyClouds Some clarity and additions to @cotopaul Pros and Cons: Single Node: PROs: - easy to manage. <-- Setup and managing configuration is easier since you only need to do that on one node. But in a cluster, all nodes configuration files will be almost the same (some variations in hostname properties and certificates if you secure your cluster). - easy to configure. <-- There are more configurations needed in a cluster setup, but once setup, nothing changes from the user experience when it comes to interacting with the UI. - no https required. <-- Not sure how this is a PRO. I would not recommend using an un-secure NiFi as doing so allow anyone access to your dataflows and the data being processed. You can also have an un-secure NiFi cluster while i do not recommend that either. CONs: - in case of issues with the node, you NiFi instance is down. <-- Very true, single point of failure. - it uses plenty of resources, when it needs to process data, as everything is done on a single node. Cluster: PROs: - redundancy and failover --> when a node goes down, the others will take over and process everything, meaning that you will not get affected. <-- Not complete accurate. Each node in a NiFi cluster is only aware of the data (FlowFiles) queued on that specific node. So each node works on the FlowFile present on that one node, so it is the responsibility of the dataflow designer/builder to make sure they built their dataflows in such away to ensure distribution of FlowFiles across all nodes. When a node goes down, any data FlowFiles currently queued on that down node are not going to be processed by the other nodes. However, other nodes will continue processing their data and all new data coming in to your dataflow cluster - the used resources will be split among all the nodes, meaning that you can cover more use cases as on a single node. <-- Different nodes do not share or pool resources from all nodes in the cluster. If your dataflow(s) are built correctly the volume of data (FlowFiles) being processed will be distributed across all your nodes along each node to process a smaller subset of the overall FlowFile volume. This means more resources available across yoru cluster to handle more volume. NEW -- A NiFi cluster can be accessed via any one of the member nodes. No matter which node's UI you access, you will be presented with stats for all nodes. There is a cluster UI accessible from the global menu that allows you to see a breakdown of each node. Any changes you make from the UI of any one of the member nodes will be replicated to all nodes. NEW -- Since all nodes run their own copy of the flow, a catastrophic node failure does not mean loss of all your work since the same flow.json.gz (contains everything related to your dataflows) can be retrieved from any of the other nodes in your cluster. CONs: - complex setup as it requires a Zookeeper + plenty of other config files. <-- NiFi cluster requires a multi node zookeeper setup. Zookeeper quorum is required for cluster stability and also stores cluster wide state needed for your dataflow. Zookeeper is responsible for electing a node in your cluster with the Cluster Coordinator role and Primary node role. IF a node goes down that has been assigned one of these roles, Zookeeper will elected one of the still up nodes to the role - complex to manage --> analysis will be done on X nodes instead of a single node. <-- not clear. Yes you have multiple nodes and all those nodes are producing their own set of NiFi-logs. However, if a component within your dataflow is producing bulletins (exceptions) it will report all nodes or the specific node(s) on which bulletin was produced. Cloudera offers centralized management of your NiFi cluster deployment via Cloudera Manager software. Makes deploying and managing NiFi cluster to multiple nodes easy, sets up and configures Zookeeper for you, and makes securing your NiFi easy as well by generating the needed certificates/keystores for you. Hope this helps, Matt

ChuckE · ‎06-01-2023

https://issues.apache.org/jira/browse/NIFI-11627 Good Idea. Here is the reference.

ChuckE · ‎05-30-2023

Thanks for taking the time to reply Matt, but this doesn't address my question. Let me use an example and hopefully that will clarify what I'm trying to do. Within a process group named "transit", there is a "Process Group Parameter Context" named "transit_variables". This parameter context (transit_variables) includes all the high level variables used throughout the process group, but it also contains a large number of Avro schemas which are referenced in an AvroSchemaRegistry controller service, which is defined within the main process group. For the sake of organizational purposes I'd like to move these Avro schemas into their own parameter context, BUT (this is the catch), I want access to them from the AvroSchemaRegistry whose scope is associated with the process group, whose scope is set to the parent parameter context. I can "solve" this problem by creating a child process group (e.g. transit_data_parser) with a new child parameter context (e.g. transit_schemas), then add additional AvroSchemaRegistry, JSONTreeReader, JSONRecordsetWriter controller services, which are coupled with the child parameter context "transit_schemas". The process group would contain a single processor with an input port and and output port, and would be used solely for the purpose of reading a flowfile into a NiFi Recordset object with Avro validation. This so-called solution is very tightly coupled and as such not very flexible. It means that every time I want to use a schema I need to create one of these tightly couple process groups, which defeats the whole purpose of the context parameter concept. What would be better is if I could reference these child schemas using a hierarchal object syntax, e.g. #{transit_variables.transit_schemas}. Using this syntax from within a single AvroSchemaRegistry controller service I can reference parameters at the parent level and all parameters at the child levels.

MattWho · ‎05-30-2023

@VLban MergeContent and MergeRecords handling merging of FlowFiles's content differently. Since your FlowFiles already contain Json formatted record(s), using MergeContent is not going to be the correct processor to use. MergeContent does not care about the data/content format (except for Avro) of the inbound FlowFiles. With Binary Concatenation, On flowFile's content bytes are simply write starting at the end of the last FlowFile's content. So in the case of JSON, the resulting merged FlowFile's content is not going to be valid json anymore. Both processors will bin FlowFiles each time the Processor executes based on its run schedule. At the end of each bin cycle the bins are evaluated to see if both configured mins are satisfied. If so, the bin will be merged. Setting a max does not mean that the bin will wait to get merged until the max has been met. So you would be better to set your min to 500 MB if you always want files of at least 500 MB and set you max to a value a bit larger then that. Doing so may result in bins that say have 480 MB binned and next FlowFile can't be added because it would then exceed configured max (FlowFile placed in new bin). So the Max Bin Age property when set will force a bin to merge once the bin has existed for the configured max bin age (this avoid FlowFile getting stuck in these merge based processors). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎05-30-2023

@svenhans I would recommend you file an Apache NiFi Jira ticket for your new component processor request. https://issues.apache.org/jira/projects/NIFI Let's say an AD/LDAP lookup processor was create. that means supplying the ldap search configuration properties along with manager DN and manager password in order to do the lookup. To avoid exposing that to other users in NiFi, maybe creating an LDAPLookup Controller service would be better. Then create a processor like getLDAPUser that gets configured to use that LDAPLookup CS along with a StandardSSL CS with user defined AD/LDAP attributes to return and require a user Identity property value. Of course you run the risk of other who have access to your NiFi copying your processor and using it to fetch any details they want about your AD/LDAP users. Matt

Scorpy257 · ‎05-30-2023

Sorry @Neil_1992 where could I find nifi.service file, to add this property ?

MattWho · ‎05-26-2023

@Gutao When interacting with the NiFi rest-api, I'd recommend creating a client certificate to use in your automation. A secured NiFi will always WANT a client certificate and will only try another configured auth method if a client certificate is not provide in the TLS exchange. Using a certificate for your rest-api automation removes the need for obtaining a token completely. You simply pass your client certificate with every rest-api call. Another advantage here over auth is token expiration. With no token involved with certificate based auth, your certificate will continuously work until it expires (typical default is 1 or 2 years). You'll need to setup authorization policies for your certificate user (Certificate DN used as user identity) for the various endpoints you are trying to interact with through the rest-api. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎05-26-2023

@mks27 Your configuration has a ldap://... address; however, you have configured the "authentication strategy as LDAPS. This needs to be "SIMPLE" instead of "LDAPS". I would also recommend that you change the "Identity Strategy" form " USE_DN" to "USE_USERNAME". ldap exception with data 52e typically means bad password. Also consider that the login-identity-providers.xml configuration file is XML. XML has special characters that if used in your manager password must be escaped or change your manager password to not use these special characters: & replace with & < replace with < > replace with > " replace with " ‘ replace with ' If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

Online	Online
Last Visited	‎02-01-2026 08:30 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎02-01-2026 08:30 AM
Posts	3,427
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: can nifi pull a folder instead of file

Re: start nifi automatically when system boots

Re: Nifi cluster or standalone, Nifi Docker or wit...

Re: Dynamically Accessing Parameter Contexts Value...

Re: Referencing Parameter Contexts from different ...

Re: Error with nifi accumulation in one file

Re: HandleHttpRequest / StandardRestrictedSSLConte...

Re: Failed to index Provenance Events org.apache.l...

Re: API Nifi + Token + SAML2

Re: Apache Nifi LDAP Authentication issue.