Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 165 | 06-03-2026 06:06 PM | |
| 461 | 05-06-2026 09:16 AM | |
| 832 | 05-04-2026 05:20 AM | |
| 499 | 05-01-2026 10:15 AM | |
| 626 | 03-23-2026 05:44 AM |
03-09-2023
11:57 AM
@davehkd Unfortunately, I would need to have access to the nifi-app.log file(s) from each node to dig in deeper. Did you copy the flow.xml.gz, flow.json.gz, users.xml, and authorizations.xml files from NiFi node 1 or 2 to NIFi node 3? These files all need to match in order for a node to join the cluster. 1. The UI of nifi1 or nifi2 shows "2/2" in the status bar just along top of canvas? 2. The UI of nifi3 shows "1/1" in the status bar just along the top of the canvas? If both above are true, this indicates nifi3 is member of a different cluster. Possible result if issue with your ZK or using a different ZK root node (nifi.zookeeper.root.node). Check for any leading or trailing whitespace in your configuration. You may also want to inspect your ZK logs for the connections coming from all three nodes. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Matt
... View more
03-08-2023
06:33 AM
Perfect it whas exactly what I missed. BR, Bruno
... View more
03-08-2023
05:28 AM
@New_User There is the following resolved jira for the creation of a new putIceberg NiFi processor: https://issues.apache.org/jira/browse/NIFI-10442 Apache has size limits on distributions and it does not appear as though this processor nar was included in the Apache release. You would need to add the nifi-iceberg-processors-nar manually to your NiFi installation for it become available for use. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
10:24 AM
@tkchea NiFi Remote Process Groups (RPG) transfer FlowFiles and not just the FlowFile content. So depending on the amount of metatdata/attributes on the FlowFile. the amount transferred would be larger. The RPG fetches Site-to-Site (S2S) details via a background thread the runs every 30 seconds regardless of existence of FlowFile. These S2S details fetched will include details on the target NiFi (Number of nodes in target cluster, load on each node, RAW ports if configured, If HTTP is enabled, etc..). These details are then used to facilitate the transfer of FlowFiles from client (RPG) and target NiFi (with Remote input or output ports). The actual transfer of FlowFile will either happen over the HTTPS port (used by a lot of other transactions) or via a RAW socket port depending on configuration. Since a FlowFile consists of two parts (FlowFile Metadata and FlowFile Content), there is going to be disk and CPU I/O involved with writing to the flowfile_repository and content_repository. So you may want to monitor those on both source and destination. When it comes to the mutual TLS handshake, NiFi is not doing anything special here. The client certificate presented is used to identify the client and verify authorization to the send to or pull from a remote port. You can also enable ssl handshake debug logging in the nifi bootstrap.conf file. java.arg.ssldebug=-Djavax.net.debug=ssl,handshake Of course you see all SSL handshakes including those when someone access the NiFi UI in the nifi-bootstrap.log file. But this would allow you to see if you are seeing systematic slow TLS handshakes or only between these two networks. You could also setup an RPG that sends to a remote input port on the same NiFi server. The same TLS handshake will happen there as well. Is it much faster (rules out an RPG issue.) If it ends up being the network between NiFi servers, you'll need to investigate there perhaps using something like wireshark may help. Another test might involve using a postHTTP or InvokeHTTP to send to a ListenHTTP or HandleHTTPRequest processor on target server (can be setup to be secure or insecure using same keystore and truststore your NiFi's use). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
09:13 AM
@Girish007 Did you make sure that the NiFi directories and repository directories are excluded from any virus software scanning? That is typically the external force that is likely to being making changes to these files. Do you have any other external software or processes scanning or access these directories? Thanks, Matt
... View more
02-28-2023
11:28 AM
1 Kudo
@TRSS_Cloudera The issue you have described links to this known issue reported in Apache NiFi" https://issues.apache.org/jira/browse/NIFI-10792 The discussion found in the comments of this jira point to a couple workarounds which includes the negatives of each. From that discussion it appears the best approach is development of a new "Excel Record Reader" controller service that could be used by the existing ConvertRecord processor and CSVRecordSetWriter. This is outlined in following jira: https://issues.apache.org/jira/browse/NIFI-11167 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
02-21-2023
09:20 AM
@PurpleK It is not clear what you mean when you say "Files that are in the 500GB+ range are taking several hours to move onto the unpack stage.". So FlowFile(s) are released to a downstream connection until processing of the source file is complete. The source file will still be represented in the queued count of the connection feeding a processors even while that processor is executing on that FlowFile. When you moving on to unpack stage, are you referring to some upstream processor feeding the connection to the UnpackContent processor taking awhile to queue some FlowFile on that downstream connection, or are you referring to once the file is queued it take awhile for unpack to complete execution on it creating on the unpacked FlowFiles and then remove original zip from upstream connection queue? Step 1 is identify the exact place(s) it is slow. Adding additional concurrent tasks to a processor has no impact on speeding up the execution on a specific source FlowFile. 1 thread get assigned to each execution of the processor and in the case of unpackContent, each tread executes against 1 FlowFile from upstream connection. Adding multiple concurrent tasks will allow multiple upstream FlowFiles to be processed concurrently. IMPORTANT: Increment concurrent tasks slowly while monitoring CPU load averages. Adding too many concurrent tasks on any one processor can impact other processors in your dataflow Event Driven Processor scheduling strategy is deprecated and will eventually go away (mostly like next major release) and should not be used. So increasing the Max Event driven Thread count under controller settings will have no impact unless you are using that strategy in your flow. It does create event threads, but they would not consume CPU if you are not using event driven scheduling anywhere in your dataflow(s). NiFi is a data agnostic service, meaning it can handle any stat type in its raw binary format. NiFi can do this because it wraps that binary content in a NiFi FlowFile. A NiFi FlowFile is what you see moving form processor to processor in your dataflows and int becomes the responsibility of the processor to understand the FlowFile's content should it need to read it. I bring this up because a FlowFile adds a small bit of overhead as it has to generate FlowFile metadata for every FlowFile created. When it comes to your 500GB+ zip files... 1. Do they consist of many small and/or large files? NiFi must create a FlowFile for each file that results from unpacking the original zip. 2. Do you see a lot of Java Garbage Collection (GC) pauses happening? All GC is stop the world. GC is normal operation or any JVM, but if GC is happening very often it can impact flow performance with constant pauses due to stop the world nature of GC. The larger the JVM memory that longer the stop the world event will be. 3. Any exceptions in your niif-app.log? You may also find this article helpful, it is old but majority of guidance is still very valid. Latest NiFi version support Java 8 and Java11, so you can ignore the G1GC recommendations if your are using Java 11. https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 Hopefully the concurrent tasks on your processor(s) excuting against the content of lareg FlowFiles will help you better utilize your hardware and achieve overall better throughput. Keep in mind that it only allows concurrent execution on multiple source FlowFiles, so will not improve speed at which a single FlowFile will be processed by a given processor. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
02-13-2023
09:51 AM
It just seems odd that this policy isn't created by default as this is part of the REST api. Or better documented in the REST API docs...
... View more
02-13-2023
09:23 AM
Not ruling out something environmental here, but what is being observed is validation working and processor execution not while both those processes should be using the same basic code. The 3 loggers that would produce Debug logging output suggested in my previous post may shed more light on the difference in the output logging when validation is done versus running (starting) the processor. So that is probably the best place to start.
... View more