Member since
07-30-2019
3470
Posts
1642
Kudos Received
1018
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 276 | 05-06-2026 09:16 AM | |
| 460 | 05-04-2026 05:20 AM | |
| 337 | 05-01-2026 10:15 AM | |
| 520 | 03-23-2026 05:44 AM | |
| 391 | 02-18-2026 09:59 AM |
03-08-2023
05:53 AM
@davehkd Parse the nifi-app.log for messages related to heartbeat and make sure that that all you nodes are creating and sending heartbeats to the ZK elected cluster coordinator. Check the nifi-app.log on the node elected as the cluster coordinator (This would be either node 1 or 2 since they show 2/2 connected nodes) for heartbeat messages and you should see it receiving heartbeats from all three nodes. If it is not receiving heartbeats from node 3, make sure their are no network or DNS resolution issue between node 3 and the other 2 nodes in the cluster. Verify that their are no typos in the nifi.properties on node 3 in the following sections: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#cluster_common_properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#cluster_node_properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#zookeeper-properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#web-properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#security_properties Check the nifi-user.log on the elected cluster coordinator and on node 3 for any TLS handshake exceptions. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-08-2023
05:28 AM
@New_User There is the following resolved jira for the creation of a new putIceberg NiFi processor: https://issues.apache.org/jira/browse/NIFI-10442 Apache has size limits on distributions and it does not appear as though this processor nar was included in the Apache release. You would need to add the nifi-iceberg-processors-nar manually to your NiFi installation for it become available for use. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
10:24 AM
@tkchea NiFi Remote Process Groups (RPG) transfer FlowFiles and not just the FlowFile content. So depending on the amount of metatdata/attributes on the FlowFile. the amount transferred would be larger. The RPG fetches Site-to-Site (S2S) details via a background thread the runs every 30 seconds regardless of existence of FlowFile. These S2S details fetched will include details on the target NiFi (Number of nodes in target cluster, load on each node, RAW ports if configured, If HTTP is enabled, etc..). These details are then used to facilitate the transfer of FlowFiles from client (RPG) and target NiFi (with Remote input or output ports). The actual transfer of FlowFile will either happen over the HTTPS port (used by a lot of other transactions) or via a RAW socket port depending on configuration. Since a FlowFile consists of two parts (FlowFile Metadata and FlowFile Content), there is going to be disk and CPU I/O involved with writing to the flowfile_repository and content_repository. So you may want to monitor those on both source and destination. When it comes to the mutual TLS handshake, NiFi is not doing anything special here. The client certificate presented is used to identify the client and verify authorization to the send to or pull from a remote port. You can also enable ssl handshake debug logging in the nifi bootstrap.conf file. java.arg.ssldebug=-Djavax.net.debug=ssl,handshake Of course you see all SSL handshakes including those when someone access the NiFi UI in the nifi-bootstrap.log file. But this would allow you to see if you are seeing systematic slow TLS handshakes or only between these two networks. You could also setup an RPG that sends to a remote input port on the same NiFi server. The same TLS handshake will happen there as well. Is it much faster (rules out an RPG issue.) If it ends up being the network between NiFi servers, you'll need to investigate there perhaps using something like wireshark may help. Another test might involve using a postHTTP or InvokeHTTP to send to a ListenHTTP or HandleHTTPRequest processor on target server (can be setup to be secure or insecure using same keystore and truststore your NiFi's use). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
09:44 AM
1 Kudo
@bmoisson @Sumit6620 When you authenticate via NiFi, there is both a client JWT token generated and a server side key generated on the node on which the authentication was performed. That Client JWT token can then be used to perform calls to rest-api endpoints on that node only for which that client is authorized. When you are obtaining your JWT token from an external authentication endpoint, NiFi won't have the server side token need to validate that token and thus rejects that token. You can find the various methods of authentication that can be configured in Apache NiFi here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
09:13 AM
@Girish007 Did you make sure that the NiFi directories and repository directories are excluded from any virus software scanning? That is typically the external force that is likely to being making changes to these files. Do you have any other external software or processes scanning or access these directories? Thanks, Matt
... View more
02-28-2023
11:28 AM
1 Kudo
@TRSS_Cloudera The issue you have described links to this known issue reported in Apache NiFi" https://issues.apache.org/jira/browse/NIFI-10792 The discussion found in the comments of this jira point to a couple workarounds which includes the negatives of each. From that discussion it appears the best approach is development of a new "Excel Record Reader" controller service that could be used by the existing ConvertRecord processor and CSVRecordSetWriter. This is outlined in following jira: https://issues.apache.org/jira/browse/NIFI-11167 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
02-21-2023
09:20 AM
@PurpleK It is not clear what you mean when you say "Files that are in the 500GB+ range are taking several hours to move onto the unpack stage.". So FlowFile(s) are released to a downstream connection until processing of the source file is complete. The source file will still be represented in the queued count of the connection feeding a processors even while that processor is executing on that FlowFile. When you moving on to unpack stage, are you referring to some upstream processor feeding the connection to the UnpackContent processor taking awhile to queue some FlowFile on that downstream connection, or are you referring to once the file is queued it take awhile for unpack to complete execution on it creating on the unpacked FlowFiles and then remove original zip from upstream connection queue? Step 1 is identify the exact place(s) it is slow. Adding additional concurrent tasks to a processor has no impact on speeding up the execution on a specific source FlowFile. 1 thread get assigned to each execution of the processor and in the case of unpackContent, each tread executes against 1 FlowFile from upstream connection. Adding multiple concurrent tasks will allow multiple upstream FlowFiles to be processed concurrently. IMPORTANT: Increment concurrent tasks slowly while monitoring CPU load averages. Adding too many concurrent tasks on any one processor can impact other processors in your dataflow Event Driven Processor scheduling strategy is deprecated and will eventually go away (mostly like next major release) and should not be used. So increasing the Max Event driven Thread count under controller settings will have no impact unless you are using that strategy in your flow. It does create event threads, but they would not consume CPU if you are not using event driven scheduling anywhere in your dataflow(s). NiFi is a data agnostic service, meaning it can handle any stat type in its raw binary format. NiFi can do this because it wraps that binary content in a NiFi FlowFile. A NiFi FlowFile is what you see moving form processor to processor in your dataflows and int becomes the responsibility of the processor to understand the FlowFile's content should it need to read it. I bring this up because a FlowFile adds a small bit of overhead as it has to generate FlowFile metadata for every FlowFile created. When it comes to your 500GB+ zip files... 1. Do they consist of many small and/or large files? NiFi must create a FlowFile for each file that results from unpacking the original zip. 2. Do you see a lot of Java Garbage Collection (GC) pauses happening? All GC is stop the world. GC is normal operation or any JVM, but if GC is happening very often it can impact flow performance with constant pauses due to stop the world nature of GC. The larger the JVM memory that longer the stop the world event will be. 3. Any exceptions in your niif-app.log? You may also find this article helpful, it is old but majority of guidance is still very valid. Latest NiFi version support Java 8 and Java11, so you can ignore the G1GC recommendations if your are using Java 11. https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 Hopefully the concurrent tasks on your processor(s) excuting against the content of lareg FlowFiles will help you better utilize your hardware and achieve overall better throughput. Keep in mind that it only allows concurrent execution on multiple source FlowFiles, so will not improve speed at which a single FlowFile will be processed by a given processor. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
02-13-2023
09:23 AM
Not ruling out something environmental here, but what is being observed is validation working and processor execution not while both those processes should be using the same basic code. The 3 loggers that would produce Debug logging output suggested in my previous post may shed more light on the difference in the output logging when validation is done versus running (starting) the processor. So that is probably the best place to start.
... View more
02-13-2023
07:19 AM
1 Kudo
@lben if you saw a bulletin on the processor reporting a failure in execution, that should also be in the nifi-app.log. You can also modify the logback.xml to change the log level of NiFi or even just the ListSFTP processor class to hopefully capture more detail on the failure. Does SFTP to target server work from command line as the NiFi service user? SFTP is just FTP over SSH. But yes, SFTP servers can be configured to only allow SFTP connections. So to get more logging out of the listSFTP processor class you could add these loggers the area where all the other loggers start to show up in the NiFi logback.xml: <logger name="org.apache.nifi.processors.standard.ListSFTP" level="DEBUG"/> <logger name="net.schmizz.sshj" level="DEBUG"/> <logger name="com.hierynomus.sshj" level="DEBUG" /> Thanks, Matt
... View more
02-13-2023
06:39 AM
@JohnF The NiFi Resource Identifier "/resources" exists to authorize third party authorizers like Apache Ranger to retrieve a list of all current NiFi Resource Identifiers (That returned list will change anytime some new component is added in NIFi). In a NiFi setup to use a local authorization provider (fie-access-policy-provider) this NiFi Resource Identify would not need to be used. As NiFi is already aware of all policies in its UI for setting up policies. So no need for it to be exposed. When using some external Authorizer, it would then be that Authorizer that is providing the authorizations needed to NiFi and within that external Authorizer it could authorize the "/resources" NiFi resource Identifier, if it wanted to get that listing to facilitate easier authorization policy implementation by being able to present that list of Identifiers to the end user. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more