Member since
07-30-2019
3470
Posts
1641
Kudos Received
1018
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 249 | 05-06-2026 09:16 AM | |
| 438 | 05-04-2026 05:20 AM | |
| 316 | 05-01-2026 10:15 AM | |
| 507 | 03-23-2026 05:44 AM | |
| 385 | 02-18-2026 09:59 AM |
02-14-2024
08:59 AM
@Sofia71 The HandleHTTPRequest processor establishes a generic endpoint, it has not idea what headers and in what format the content of those headers will be. You client creates the request and decides what haaders and format of the header content. I would recommend in yoru testing that you start the HandleHTTPRequest processor a keep the downstream processor stopped so that the incoming request becomes queued in the connection between the HandleHTTPRequest and the next downstream processor. You can then right click on the connection and list the flowfiles in the connection. From the list you can view the details of the queude FlowFile which will aloo you to see the generated "http.headers.<some client derived string>" added as attributes to the FlowFile along with the values for those headers. Using that information you can construct your validations. in the RouteOnAttribute processor. You'll need to verify the format of the authorization data pre encoding coming in the request header match exactly with the format of the authorization data you have put in the parameter context. You could also decode the authorization header contents to make sure it matches with what you constructed in your authorization parameter. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-14-2024
06:17 AM
@plapla This sounds like the putElasticSearchHTTP processor is working as designed. It is putting to ElasticSearch over HTTP and ElasticSearch is successfully processing that request; however your ElasticSearch is not responding to the original http request before the timeout has occurred. As a result, putElasticSearchHTTP has routed to failure. The question here is what are you doing with the failure relationship? If you configured "retry" or looped the failure relationship via a connection back on the putElasticSearchHTTP processor, and the same FlowFile would be processed a second time. You may be able to solve this by simply increasing the configured "Response Timeout" configuration on the putElasticSearchHTTP processor. But you may also want to look at the particular files that encounter this issue and see if their are any consistencies across them such as larger sizes, time of day, load on ElasticSearch at time, number of concurrent pending request on ElasticSearch side, network load, etc... If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-14-2024
06:04 AM
2 Kudos
@iriszhuhao This might be a good use case for using the FlowFile Concurrency and Outbound Policy configuration option on a process group. FlowFile concurrency allows you to place a portion of your dataflow into a process group and be able to control how the initial FlowFile or batch of FlowFiles is allowed to enter that process group for processing. The Outbound Policy controls when the FlowFiles being processed in that process group will be released to processor(s) downstream of that process group. Downstream components of the process group will not receive FlowFiles from the process group until all FlowFiles within the process group have either been auto-terminated or queued up to one or more output ports. When the outbound policy is met, the FlowFile(s) are released downstream and the Process group's FlowFile concurrency then allows for next batch processing. So it might makes sense to place the portion of your dataflow comprised of your nine concurrent branches in this bounded process group and downstream you have your your ExecuteSQLRecord processor call your final procedure now that you know all branches have completed. Above solves your problem with not all nine branches always being used. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-12-2024
06:02 AM
@PriyankaMondal Very simply... What @cotopaul responded with. One of the biggest definers of performance is your dataflow design itself. Apache NiFi offers so many pluggable components for building out your dataflows and not all will perform the same. While NiFi makes it easy to create dataflows, building the perfect highest performing dataflows can take some trial and error to get there. I'd always recommend testing and modeling to understand the performance characteristics of the dataflow you built. Identify and adjust where you see your bottlenecks. Try different designs using different processors when possible. Work with records instead of many small individual FlowFiles when possible for better performance. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-12-2024
05:50 AM
@Sofia71 The HandleHTTPRequest processor listens for incoming connection being sent to it from an external source and then relies on the HandleHTTPResponse processor to sent back the response to that incoming request. So first question is how are you collecting this data? Are you trying to fetch it? If so, you should be using the InvokeHTTP processor instead. If the source is sending the data to your NiFi then you are using the correct processor. Doing any from of client based authentication would need to be need to be handled within your dataflow following the HandleHTTPRequest processor. The processor itself will not do authorization and the only form of authentication it can do is mutualTLS based. So for basic authorization you would need the user basic authentication presented in the request headers. The HandleHTTPRequest processor will add those headers as attributes on the produced FlowFile. You mention the authorization header username and password would be base64 encoded, so you could use NiFi Expression Language to via the UpdateAttribute processor and the base64decode function to decode them. How you validate them is up to you after you have them. If they are LDAP based credentials, perhaps you could write a script you pass them to via one of the scripting processor to validate the username and password are correct? If you want to keep it very basic, you could use an RouteOnAttribute processor that checks to see if username and password match what you say they should be and if they do, pass the FlowFile on downstream; otherwise, terminate the FlowFile there. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-05-2024
06:38 AM
@hegdemahendra Curious about the differences between your prod and uat environments here. 1. Same number of nodes in each environments NiFi cluster? 2. Same exact configurations on the ConsumeKafka processor (except consumer group ID) ? 3. Screenshot of scheduling tab for consumeKafka processor? 4. Versions of NiFi (Apache, CFM, HDF) used in both environments? 5. Any observed rebalancing or consumer group related exceptions in the logs? Having more consumers in a consumer group than the number of partitions can lead to constant rebalance occurring. The number of consumers in the consumer group is calculated by multiplying the number of nodes in your NiFi cluster by the number of concurrent tasks configured on the ConsumeKafka processor. So if you have a 3 node cluster, you should have only 1 concurrent task, so number of consumers is equal to or less the the number of partitions. Thanks, Matt
... View more
02-02-2024
10:58 AM
1 Kudo
@oneofthemany You would get better traction form the community by starting a new question since this question already has an accepted solution and has nothing to do with NiFi TLS-toolkit usage. Thank you, Matt
... View more
02-01-2024
07:09 AM
1 Kudo
@PriyankaMondal I am not clear by you statement: if Nifi processor (any processor within a process group) stops suddenly due to load/any other issue You are saying you see a NiFi processor transition to a stopped state unexpectedly? This should never happen. Or are you saying the processor seems to dtop processing FlowFiles even though it is currently in a running/started state? NiFi queues FlowFiles in connection between processor components. A FlowFile is not removed from the inbound connection to a processor component until that FlowFile has been successfully processed by the consuming processor. The FlowFile consist of two parts: 1. FlowFile attributes/metadata that is persisted in the NiFi flowfile_repository. 2. FlowFile content persisted within claims inside the content_repository. To protect from data loss these repositories should be using protected storage such as RAID. So if NiFi were to suddenly crash or server itself crash, when NiFi is restarted on that down node it will load its flow and then load the FlowFile back in to the connections. Processing will begin again against those FlowFiles by downstream processor component. NiFi's design favors data duplication over data loss ir order to avoid data loss posibilities. For example: Let's assume that a NiFi processor completed execution against a FlowFile resulting in writing something out to an external endpoint. in response to that successful operation the processor would then move the FlowFile from the inbound connection to some a downstream relationship. If NiFi were to crash in that very moment before the FlowFile was moved, on startup the same FlowFile would load in the inbound connection and get processed again. Also keep in mind that you are running 3 node NiFi cluster and within a NiFi cluster each connected node runs its own copy of the flow, its own set of repositories, and its own local state. So each node is unaware of the FlowFiles being processed by another node in the same cluster. Generally speaking when you have a processor that shows active threads indicator on it and zeroed out stats, you either have a very long running thread or a hung thread (only examination of serious of thread dumps can make the determination. Most commonly this is a resource utilization problem. But could also be dataflow design issue, client library issue, or network issue. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-01-2024
06:44 AM
@Sartha I don't know how to respond to "I followed the flow as per your guidance but still it doesn't worked." The flow I provided works. What exceptions/errors are you encountering? You have not provided much detail on what you are seeing exception/issue wise. Thank you, Matt
... View more
02-01-2024
06:38 AM
@ALWOSABY The putHDFS processor has properties for changing the Remote Owner and Remote Group, but in order to use these properties certain condition must be met: If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more