About MattWho

MattWho · ‎02-22-2024

@plapla A secured NiFi is secured over HTTPS which means a TLS exchange happens to secure the connection with NiFi. A secured NiFi will always support a mutual TLS exchange. If no other methods of user authentication are configured, NiFI will "REQUIRE" a clientAuth certificate be presented in the TLS exchange with NiFi. When NiFi is configured with an additional user authentication method (for example, you have enabled the ldap-provider for user authentication), NiFi will "WANT" a clientAuth certificate in the TLS exchange. If a clientAuth certificate is not provided in the TLS exchange/handshake, NiFi moves on to the next authentication method configured. The ldap-provider will require obtaining a user token as you saw that then needs to be included with all subsequent rest-api calls. And you are correct that the token does expire. That is why it is easier and a better option to use mutual TLS based authentication when doing automation like this. The clientAuth certificate is simply included in every rest-api request and there is no token involved. About the ClientAuth certificate... The full DN from the certificate is what is used to identify the user. The full DN is evaluated against any identity.mapping.pattern.xxx properties configured in the nifi.properties file. If a configured pattern (java regex) matches against the DN, the identity.mapping.value.xxx and identity.mapping.transform.xxx is applied. These identity mappings are often used to trim the CN value from the complete DN. The resulting string after any mappings are applied is what is then used to look up authorizations for that client/user. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-20-2024

@iriszhuhao I would use ModifyBytes processor within the Process group just before the output port to remove the content of the FlowFiles. The outside the Process Group add a MergeContent that will merge all those 0 bytes FlowFiles just released in a batch from the Process Group before you execute your stored procedures. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-16-2024

@cbravom The UpdateAttribute processor does not read the content of the FlowFile. It is used to add or modify attributes/metadata on the NiFi FlowFile. For extracting the content to the FlowFile, you'll want to use a processor like ExtractText. Keep in mind that the FlowFile attributes for all active FlowFiles is held in NiFi's heap memory. If you can avoid extracting the content to attribute(s) and still satisfy your user case, that would be much more efficient. If you found that any provided answer helped you, please take a moment to login and click accept below those answers. Matt

MattWho · ‎02-16-2024

@bucketsoffun Welcome to Apache NiFi. NiFi's Site-To-Site capability has been around since the very early stages of Apache NiFi. While the functionality has not changed, the world of security has evolved around it. Prior to the release of Apache NiFi 1.14, the out-of-the-box installation was unsecured over HTTP. Securing NiFi required user configuration changes. 1.14+ out-of-the-box Apache NiFi start secured over HTTPS with generic certificates with the intent of being used only for evaluation purposes. Out-of-the box uses the following Security providers: single-user-provider --> Provides a single username and password only for authentication with only this NiFi instance. Dos not support additional local users. Does not support NiFi clusters. Single-user-authorizer --> Authorizes the single-user-provider user for complete authorized access to NiFi. Does not allow setting up additional authorizations. This information will help understand the challenges you are having with using Sit-to-Site if you are using these out of the box security providers. The Site-To-Site basics: The Remote Process Group (RPG) is always the client in the client to server connection, It always establishes the connection with the server side. The "remote" input/output ports are always the server side in the client to server connection. The client will either pull FlowFiles from a remote output port or push FlowFiles to remote input port. Using localhost in the not going to work configured in the nifi.remote.input.host unless the only target you want the RPG to be able to talk to is itself (same server/host). The RPG added to the canvas of your NiFi A: Configured with a URL for NiFi B (https://<nifi B hostname>:<nifi.web.https.port>/nifi) If configured with transport Protocol "RAW", will transmit FlowFiles or receive FlowFiles via the nifi.remote.input.socket.port configured on NiFi B If configured with transport protocol "HTTP", will transmit FlowFiles or receive FlowFiles via the nifi.web.https.port on NiFi B. Once you "Apply" the RPG configuration, a background Site0To-Site thread executes periodically to fetch the Site-To-SIte details from the target URL (NiFi B). Those Site-To-Site details include details about that target NiFi (is it clustered or standalone, hostnames of all the cluster nodes, http/https ports for all cluster instances, all cluster node remote socket ports, load on each node in the target NiFi cluster, list of remote input and output ports that this client is authorized to use, etc). This information is then used for the actual connection and FlowFile transfer. Authorization becomes the first challenge. Since the RPGs initial connection regardless of RAW or HTTP transport protocol configuration is always to the NiFi UIs hostname and port. When that is HTTPS that will require negotiating a successful mutualTLS exchange/handshake. The Client (RPG) will present the clientAuth certificate from the keystore configured in the nifi.properties file. The DN from that certificate is used as the user identity. That identity is then used check if the client has been authorized to "receive site-to-site details". This is what authorizes the RPG to get all the S2S details I mentioned earlier. The Remote Input ports and remote output ports also have client authorizations needed for the client (input port --> "receive data via site-to-site" and output port --> "send data via site-to-site"). So using the default security providers makes setting up the authorizations needed for site-to-site not possible between different hosts. So you should first experiment with setting up security using some other form of authentication and authorization. Another option could be to unsecure your NiFi. This remove any need for authentication and authorization. You could then use the IP addresses assigned to your NiFi A and NiFi B in all the hostname configurations. There are a lot moving parts here important to Site-To-Site between hosts. I know this does not give you a step 1 to n process to get what you have working, but hopefully gets you high level communications negotiations that happen. Same exist when unsecured just without the authentication and authorization complexity. I'd encourage you to explore the production ready authorization providers first so you have better understanding of the granular nature of NiFi authorizations. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-14-2024

@Sofia71 The HandleHTTPRequest processor establishes a generic endpoint, it has not idea what headers and in what format the content of those headers will be. You client creates the request and decides what haaders and format of the header content. I would recommend in yoru testing that you start the HandleHTTPRequest processor a keep the downstream processor stopped so that the incoming request becomes queued in the connection between the HandleHTTPRequest and the next downstream processor. You can then right click on the connection and list the flowfiles in the connection. From the list you can view the details of the queude FlowFile which will aloo you to see the generated "http.headers.<some client derived string>" added as attributes to the FlowFile along with the values for those headers. Using that information you can construct your validations. in the RouteOnAttribute processor. You'll need to verify the format of the authorization data pre encoding coming in the request header match exactly with the format of the authorization data you have put in the parameter context. You could also decode the authorization header contents to make sure it matches with what you constructed in your authorization parameter. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-14-2024

@plapla This sounds like the putElasticSearchHTTP processor is working as designed. It is putting to ElasticSearch over HTTP and ElasticSearch is successfully processing that request; however your ElasticSearch is not responding to the original http request before the timeout has occurred. As a result, putElasticSearchHTTP has routed to failure. The question here is what are you doing with the failure relationship? If you configured "retry" or looped the failure relationship via a connection back on the putElasticSearchHTTP processor, and the same FlowFile would be processed a second time. You may be able to solve this by simply increasing the configured "Response Timeout" configuration on the putElasticSearchHTTP processor. But you may also want to look at the particular files that encounter this issue and see if their are any consistencies across them such as larger sizes, time of day, load on ElasticSearch at time, number of concurrent pending request on ElasticSearch side, network load, etc... If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-14-2024

@iriszhuhao This might be a good use case for using the FlowFile Concurrency and Outbound Policy configuration option on a process group. FlowFile concurrency allows you to place a portion of your dataflow into a process group and be able to control how the initial FlowFile or batch of FlowFiles is allowed to enter that process group for processing. The Outbound Policy controls when the FlowFiles being processed in that process group will be released to processor(s) downstream of that process group. Downstream components of the process group will not receive FlowFiles from the process group until all FlowFiles within the process group have either been auto-terminated or queued up to one or more output ports. When the outbound policy is met, the FlowFile(s) are released downstream and the Process group's FlowFile concurrency then allows for next batch processing. So it might makes sense to place the portion of your dataflow comprised of your nine concurrent branches in this bounded process group and downstream you have your your ExecuteSQLRecord processor call your final procedure now that you know all branches have completed. Above solves your problem with not all nine branches always being used. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-12-2024

@PriyankaMondal Very simply... What @cotopaul responded with. One of the biggest definers of performance is your dataflow design itself. Apache NiFi offers so many pluggable components for building out your dataflows and not all will perform the same. While NiFi makes it easy to create dataflows, building the perfect highest performing dataflows can take some trial and error to get there. I'd always recommend testing and modeling to understand the performance characteristics of the dataflow you built. Identify and adjust where you see your bottlenecks. Try different designs using different processors when possible. Work with records instead of many small individual FlowFiles when possible for better performance. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-12-2024

@Sofia71 The HandleHTTPRequest processor listens for incoming connection being sent to it from an external source and then relies on the HandleHTTPResponse processor to sent back the response to that incoming request. So first question is how are you collecting this data? Are you trying to fetch it? If so, you should be using the InvokeHTTP processor instead. If the source is sending the data to your NiFi then you are using the correct processor. Doing any from of client based authentication would need to be need to be handled within your dataflow following the HandleHTTPRequest processor. The processor itself will not do authorization and the only form of authentication it can do is mutualTLS based. So for basic authorization you would need the user basic authentication presented in the request headers. The HandleHTTPRequest processor will add those headers as attributes on the produced FlowFile. You mention the authorization header username and password would be base64 encoded, so you could use NiFi Expression Language to via the UpdateAttribute processor and the base64decode function to decode them. How you validate them is up to you after you have them. If they are LDAP based credentials, perhaps you could write a script you pass them to via one of the scripting processor to validate the username and password are correct? If you want to keep it very basic, you could use an RouteOnAttribute processor that checks to see if username and password match what you say they should be and if they do, pass the FlowFile on downstream; otherwise, terminate the FlowFile there. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-05-2024

@saquibsk Sounds like you should be using the JoltTransformRecord processor instead of the JoltTransformJson processor. You could then use a Json Reader (JsonTreeReader or JsonPathReader) and JsonRecordSetWriter the read and write the output of a multi-record FlowFile's contents. This allows the configured transform to be applied to each individual record instead being applied against the entire multi-record content. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎01-13-2026 05:07 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-13-2026 05:07 AM
Posts	3,421
Kudos received	1620

Cloudera Community

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Apache Nifi Rest Api Authorization

Re: How to detect all branches in a NiFi flow have...

Re: Capture de content of a file in Nifi from HDFS

Re: Learning nifi and testing between 2 laptop

Re: Basic Authenitication in Nifi

Re: Nifi consume duplicate message problem from Ka...

Re: How to detect all branches in a NiFi flow have...

Re: Nifi throughput calculation for minimun hardwa...

Re: Basic Authenitication in Nifi

Re: JIRA Data Load to Database (JSON to Database)