Member since
07-30-2019
3472
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 247 | 06-03-2026 06:06 PM | |
| 523 | 05-06-2026 09:16 AM | |
| 1022 | 05-04-2026 05:20 AM | |
| 581 | 05-01-2026 10:15 AM | |
| 696 | 03-23-2026 05:44 AM |
03-09-2023
11:57 AM
@davehkd Unfortunately, I would need to have access to the nifi-app.log file(s) from each node to dig in deeper. Did you copy the flow.xml.gz, flow.json.gz, users.xml, and authorizations.xml files from NiFi node 1 or 2 to NIFi node 3? These files all need to match in order for a node to join the cluster. 1. The UI of nifi1 or nifi2 shows "2/2" in the status bar just along top of canvas? 2. The UI of nifi3 shows "1/1" in the status bar just along the top of the canvas? If both above are true, this indicates nifi3 is member of a different cluster. Possible result if issue with your ZK or using a different ZK root node (nifi.zookeeper.root.node). Check for any leading or trailing whitespace in your configuration. You may also want to inspect your ZK logs for the connections coming from all three nodes. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Matt
... View more
03-08-2023
07:44 AM
2 Kudos
@GSB If you wanted it always to be two digits, you would need to apply the same if/else NiFi Expression Language (NEL) logic the minute calculations in the working solution provided by @cotopaul ${value:divide(3600):lt(10):ifElse(${value:divide(3600):prepend(0)},${value:divide(3600)})}:${value:divide(60):mod(60):lt(10):ifElse(${value:divide(3600):mod(60):prepend(0)},${value:divide(3600):mod(60)})} A simpler approach would be to use the toDate and Format NEL functions: ${value:toDate('sssss'):format('HH:mm')} I allow 5 's' assuming that max value would be 86,500 seconds (24 hours in a day) and does not matter if value is smaller. This format also allows you to quickly and easily adjust format for example, maybe you don't want to truncate the remaining seconds and use ":format('HH:mm:ss')" instead. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-08-2023
06:06 AM
@Bgrilher I not completely clear on your ask here. According to the ValidateJson processor documentation, a FlowFile Attribute is added to FlowFiles that are routed to the "invalid" relationship: You can route this "invalid" relationship via a connection to a logAttribute processor which can write a log line out to the nifi-app.log (default) with what was written to this FlowFile attribute. If you are not actually looking to see it generate log output but just want to see what was written to this FlowFile attribute, you can use NiFi data provenance for this. Data provenance will give you ability to see FlowFile metedata for a FlowFile in all stages throughout the dataflow that FlowFile progressed. You can also view and download the content (if it is still present in a NiFi dataflow or still present in NiFi archive) at by stage of its processing through your dataflow. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-08-2023
05:53 AM
@davehkd Parse the nifi-app.log for messages related to heartbeat and make sure that that all you nodes are creating and sending heartbeats to the ZK elected cluster coordinator. Check the nifi-app.log on the node elected as the cluster coordinator (This would be either node 1 or 2 since they show 2/2 connected nodes) for heartbeat messages and you should see it receiving heartbeats from all three nodes. If it is not receiving heartbeats from node 3, make sure their are no network or DNS resolution issue between node 3 and the other 2 nodes in the cluster. Verify that their are no typos in the nifi.properties on node 3 in the following sections: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#cluster_common_properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#cluster_node_properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#zookeeper-properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#web-properties https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#security_properties Check the nifi-user.log on the elected cluster coordinator and on node 3 for any TLS handshake exceptions. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-08-2023
05:28 AM
@New_User There is the following resolved jira for the creation of a new putIceberg NiFi processor: https://issues.apache.org/jira/browse/NIFI-10442 Apache has size limits on distributions and it does not appear as though this processor nar was included in the Apache release. You would need to add the nifi-iceberg-processors-nar manually to your NiFi installation for it become available for use. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
10:24 AM
@tkchea NiFi Remote Process Groups (RPG) transfer FlowFiles and not just the FlowFile content. So depending on the amount of metatdata/attributes on the FlowFile. the amount transferred would be larger. The RPG fetches Site-to-Site (S2S) details via a background thread the runs every 30 seconds regardless of existence of FlowFile. These S2S details fetched will include details on the target NiFi (Number of nodes in target cluster, load on each node, RAW ports if configured, If HTTP is enabled, etc..). These details are then used to facilitate the transfer of FlowFiles from client (RPG) and target NiFi (with Remote input or output ports). The actual transfer of FlowFile will either happen over the HTTPS port (used by a lot of other transactions) or via a RAW socket port depending on configuration. Since a FlowFile consists of two parts (FlowFile Metadata and FlowFile Content), there is going to be disk and CPU I/O involved with writing to the flowfile_repository and content_repository. So you may want to monitor those on both source and destination. When it comes to the mutual TLS handshake, NiFi is not doing anything special here. The client certificate presented is used to identify the client and verify authorization to the send to or pull from a remote port. You can also enable ssl handshake debug logging in the nifi bootstrap.conf file. java.arg.ssldebug=-Djavax.net.debug=ssl,handshake Of course you see all SSL handshakes including those when someone access the NiFi UI in the nifi-bootstrap.log file. But this would allow you to see if you are seeing systematic slow TLS handshakes or only between these two networks. You could also setup an RPG that sends to a remote input port on the same NiFi server. The same TLS handshake will happen there as well. Is it much faster (rules out an RPG issue.) If it ends up being the network between NiFi servers, you'll need to investigate there perhaps using something like wireshark may help. Another test might involve using a postHTTP or InvokeHTTP to send to a ListenHTTP or HandleHTTPRequest processor on target server (can be setup to be secure or insecure using same keystore and truststore your NiFi's use). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
09:44 AM
1 Kudo
@bmoisson @Sumit6620 When you authenticate via NiFi, there is both a client JWT token generated and a server side key generated on the node on which the authentication was performed. That Client JWT token can then be used to perform calls to rest-api endpoints on that node only for which that client is authorized. When you are obtaining your JWT token from an external authentication endpoint, NiFi won't have the server side token need to validate that token and thus rejects that token. You can find the various methods of authentication that can be configured in Apache NiFi here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
03-01-2023
09:13 AM
@Girish007 Did you make sure that the NiFi directories and repository directories are excluded from any virus software scanning? That is typically the external force that is likely to being making changes to these files. Do you have any other external software or processes scanning or access these directories? Thanks, Matt
... View more
02-28-2023
11:28 AM
1 Kudo
@TRSS_Cloudera The issue you have described links to this known issue reported in Apache NiFi" https://issues.apache.org/jira/browse/NIFI-10792 The discussion found in the comments of this jira point to a couple workarounds which includes the negatives of each. From that discussion it appears the best approach is development of a new "Excel Record Reader" controller service that could be used by the existing ConvertRecord processor and CSVRecordSetWriter. This is outlined in following jira: https://issues.apache.org/jira/browse/NIFI-11167 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
02-21-2023
09:20 AM
@PurpleK It is not clear what you mean when you say "Files that are in the 500GB+ range are taking several hours to move onto the unpack stage.". So FlowFile(s) are released to a downstream connection until processing of the source file is complete. The source file will still be represented in the queued count of the connection feeding a processors even while that processor is executing on that FlowFile. When you moving on to unpack stage, are you referring to some upstream processor feeding the connection to the UnpackContent processor taking awhile to queue some FlowFile on that downstream connection, or are you referring to once the file is queued it take awhile for unpack to complete execution on it creating on the unpacked FlowFiles and then remove original zip from upstream connection queue? Step 1 is identify the exact place(s) it is slow. Adding additional concurrent tasks to a processor has no impact on speeding up the execution on a specific source FlowFile. 1 thread get assigned to each execution of the processor and in the case of unpackContent, each tread executes against 1 FlowFile from upstream connection. Adding multiple concurrent tasks will allow multiple upstream FlowFiles to be processed concurrently. IMPORTANT: Increment concurrent tasks slowly while monitoring CPU load averages. Adding too many concurrent tasks on any one processor can impact other processors in your dataflow Event Driven Processor scheduling strategy is deprecated and will eventually go away (mostly like next major release) and should not be used. So increasing the Max Event driven Thread count under controller settings will have no impact unless you are using that strategy in your flow. It does create event threads, but they would not consume CPU if you are not using event driven scheduling anywhere in your dataflow(s). NiFi is a data agnostic service, meaning it can handle any stat type in its raw binary format. NiFi can do this because it wraps that binary content in a NiFi FlowFile. A NiFi FlowFile is what you see moving form processor to processor in your dataflows and int becomes the responsibility of the processor to understand the FlowFile's content should it need to read it. I bring this up because a FlowFile adds a small bit of overhead as it has to generate FlowFile metadata for every FlowFile created. When it comes to your 500GB+ zip files... 1. Do they consist of many small and/or large files? NiFi must create a FlowFile for each file that results from unpacking the original zip. 2. Do you see a lot of Java Garbage Collection (GC) pauses happening? All GC is stop the world. GC is normal operation or any JVM, but if GC is happening very often it can impact flow performance with constant pauses due to stop the world nature of GC. The larger the JVM memory that longer the stop the world event will be. 3. Any exceptions in your niif-app.log? You may also find this article helpful, it is old but majority of guidance is still very valid. Latest NiFi version support Java 8 and Java11, so you can ignore the G1GC recommendations if your are using Java 11. https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 Hopefully the concurrent tasks on your processor(s) excuting against the content of lareg FlowFiles will help you better utilize your hardware and achieve overall better throughput. Keep in mind that it only allows concurrent execution on multiple source FlowFiles, so will not improve speed at which a single FlowFile will be processed by a given processor. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more