About MattWho

NIFI-USER · ‎07-02-2024

Yes, almost same behavior is observed with retry strategy as "penalize". Just the additional penalty duration gets added into the time. For example by default the penalty duration is 30 secs, if incoming flow files are 10 and number of retries is 1. For this case 10 flow files are clubbed up and first retry happens at 50secs. Then for 30secs it penalizes the clubbed flow files. Then after 50secs it goes into the failure relationship. So, In total (numberOfRetries+1)*5secs*(numberOfInComingFlowFiles) + Penalty duration time taken by publishKafka processor to route file into failure relationship in case of penalize retry policy. If retry is not checked then similar behavior like yield is observed 5*numberOfIncomingFlowFiles secs to route to failure relationship as shown in photos. Penalty and yield settings are default only. target kafka version is 3.4.0 and number of partition is 1. Number of nifi nodes are 3. Number of concurrent Tasks on PublishKafkaRecord is 1, but the execution is on all nodes, which is I think 1 thread on 3 nodes each.

MattWho · ‎07-02-2024

@enam Have a slight mistake in my NiFi Expression Language (NEL) statement in my above post. Should be as follows instead: Property = filename Value = ${filename:substringBeforeLast('.')}-${UUID()}.${filename:substringAfterLast('.')} Thanks, Matt

MattWho · ‎07-02-2024

@Vikas-Nifi the following error is directly related to failure to establish certificate trust in the TLS exchange between NiFi's putSlack processor and your slack server: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target " The putSlack processor utilizes the StandardRestrictedSSLContextService to define keystore and truststore files the putSlack processor will use. The truststore must contain the complete trustchain for the target slack server's serverAuth certificate. You can use: openssl s_client -connect <companyName.slack.com>:443 -showcerts to get an output of all public certs included with the serverAuth cert. I noticed with my slack endpoint that was not the complete trust chain (root CA certificate for ISRG Root X1 was missing from the chain). You can download the missing rootCA public cert directly from let's encrypt and add it to the truststore set in the StandardRestrictedSSLContextService. https://letsencrypt.org/certificates/ https://letsencrypt.org/certs/isrgrootx1.pem https://letsencrypt.org/certs/isrg-root-x2.pem You might also want to make sure all intermediate CAs are also added and not just the intermediate returned by the openssl command just in case server changes that you get directed to. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-02-2024

@greenflag Not knowing anything about this rest-api endpoint, all I have are questions. How would you complete this task outside of NiFi? How would you accomplish this using curl from command line? What do the REST-API docs for your endpoint have in terms of how to get files? Do they expect you to pass the filename in the rest-api request? What is the rest-api endpoint that would return the list of files? My initial thought here (with making numerous assumptions about your endpoint) is that you would need multiple InvokeHTTP processors possibly. The first InvokeHTTP in the dataflow hits the rest-api endpoint that outputs the list of files in the endpoint directory which would end up in the content of the FlowFile. Then you split that FlowFile by its content so you have multiple FlowFiles (1 per each listed file). Then rename each FlowFile using the unique filename and finally pass each to another invokeHTTP processor that actually fetches that specific file. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-01-2024

@NeheikeQ yes, newer version of 1.x NiFi-Registry will support older versions of NiFi version controlling to it. For NiFi after upgrade, load the flow.xml.gz on one node and start it. Then start the other nodes so that they all inherit the flow from the one node where you had a flow.xml.gz. At this point all nodes should join successfully and will have the same dataflow loaded. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-01-2024

@Dave0x1 Typically MergeContent processor will utilize a lot of heap when the number of FlowFiles being merged in a single execution is very high and/or the size of the FlowFile's attributes are very large. While FlowFiles queued in a connection will have the FlowFile attributes/metadata held in NiFi heap, there is a swap threshold at which time NiFi swaps FlowFile attributes to disk. When it comes to MergeContent, FlowFile are allocated to bins (will still show in inbound connection count). FlowFiles allocated to bin(s) can not be swapped. So if you set min/max num flowfiles or min/max size to a large value, it would result in large amounts of heap usage. Note: FlowFile content is not held in heap by mergeContent. So the way to create very large merged files while keeping heap usage lower is by chaining multiple mergeContent processor together in series. So you merge a batch of FlowFiles in first MergeContent and then merge those into larger merged FlowFile in a second MergeContent. Also be mindful of extracting content to FlowFile attributes or generating FlowFile attributes with large values to help minimize heap usage. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Vikas-Nifi · ‎06-21-2024

@SAMSAL @cotopaul @MattWho : please help on above

MikeH · ‎06-18-2024

That's a good idea, however low latency is a user requirement. Currently, processing each file from source to destination takes around one minute. If I add a two minute delay, the users would not be happy.

MattWho · ‎06-18-2024

@omeraran If your source is continuously being written to you might consider using the GenerateTableFetch processor --> ExecuteSQLRecord processor (configured to use JsonRecordSetWriter) --> PutDatabaseRecord processor. Working with multi-record FlowFiles by utilizing the record based processor is going to be a more efficient and performant dataflow. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-14-2024

@Alexy Without seeing your logs, I have no idea which NiFi classes are producing the majority of your logging. But logback is functioning exactly as you have it configured. Each time the nifi-app.log reaches 500 MB within a single day it is compressed and rolled using an incrementing number. I would suggest changing the log level for the base class "org.apache.nifi" from INFO to WARN. The bulk of all NiFi classes begin with org.apache.nifi and by changing this to WARN to you will only see ERROR and WARN level log output from the bulk of the ora.apache.nifi.<XYZ...> classes. <logger name="org.apache.nifi" level="WARN"/> Unless you have a lot of exception happening within your NiFi processor components used in your dataflow(s), this should have significant impact on the amount of nifi-app.log logging being produced. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎11-16-2025 04:35 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-16-2025 04:35 PM
Posts	3,390
Kudos received	1614

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: Apache Nifi PublishKafka Retry Mechanism in ca...

Re: how ot change file name moving another locati...

Re: NiFi Slack Integration issue - 1.26.0

Re: Apache Nifi: How to get all data csv in folder...

Re: NiFi node disconnection from Cluster + Diff in...

Re: NiFi high jvm heap utilization on primary node

Re: Nifi-putSlack isssue

Re: Limit number of files fetched by directory

Re: Need Help About of Apache NIFI Data Migration ...

Re: Nifi Logrotation Policy