About MattWho

MattWho · ‎01-05-2023

@davehkd Those log lines look like expected output in the nifi-app.log during startup process. Niether is an ERROR. - Are you using the embedded zookeeper or an external ZK (strongly recommended)? The first indicates that ZK has not elected a cluster coordinator yet. This can happen if ZK does not finished coming up yet or does not yet have quorum. ZK required an odd number of hosts (3, 5, etc) to achieve quorum and without quorum will not function. 3 is the recommended number of ZK hosts to support NiFi. Once ZK is up and established quorum, I'd expect that WARn log message to go away. The second info message simply means that this node was unaware of an elected cluster coordinator and has requested to be elected to that role. ZK responded that it had already elected some other node as the cluster coordinator. This node should receive the elected cluster coordinator from ZK and you should then start seeing in the logs your nodes sending heartbeat messages to the elected cluster coordinator (even the elected cluster coordinator when send a heartbeat to itself.). Only the cluster coordinator will log receiving and processing x number of received heartbeats. My guess here is that you may not have given it enough time to full launch. When you start NiFi via "../bin/nifi.sh start", it executes the bootstrap process, the bootstrap process then kicks off the main child process for NiFi. That process you'll see through the nifi-app.log output as it progresses. NiFi is fully up once you see the log line that states NiFi Ui is available at the following URLs. Now that the NiFi node is fully up it attempts to communicate with ZK and establish itself as part of a cluster. Especially with embedded ZK in use, this can be delayed until all nodes are up so that ZK has quorum. So first node to come up may log more lines like above then last node to finish startup. NiFi handles election based on configuration of these two properties in the nifi.properties file: nifi.cluster.flow.election.max.wait.time (default is 5 mins) nifi.cluster.flow.election.max.candidates (No default, but should be set to number of NiFi instances in cluster) So basically, NiFi nodes will wait up to 5 minutes or until the configured number of candidates have connected with ZK before flow election happens and NiFi finishes coming up. Accessing the UI before this happens would result in flow election still in progress. Make sure that the "../conf/<nifi config files>" are all configured same across all nodes with exception of node specific properties like hostnames, keystores, truststores, etc. Hope that after some additional time, your NiFi cluster did finally come up for you. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎01-04-2023

@sarithe NiFi component processors are part of pluggable nar in NiFi. They are separate from the core NiFi code. Processors are designed to log their output base on their component processor class. It then becomes the responsibility of the logback to route those log messages to the appropriate appender. There is nothing in the log output produced by a component processor that will inherently identify which parent process group it resides within. But if you were to use a consistent processor naming structure in each of your Process Groups (PG), you may be able to setup some creative filtering in logback based on that naming structure. Bulletins however do include details about the parent Process Group in which the component generating the bulletin resides. You could build a dataflow in yoru NiFi to handle bulletin notification through the use of the SiteToSiteBulletinReportingTask which is used to send bulletin to a destination remote import port on a target NiFi. A dataflow on the target NiFi could be built to parse the received bulletin records by the bulletinGroupName json path property so that all records from same PG are kept together. These 'like' records could then be written out to local filesystem, remote system, used to send email notifications, etc... Example of what a Bulletin sent using the SiteToSiteBulletinReportingTask looks like: { "objectId" : "541dbd22-aa4b-4a1a-ad58-5d9a0b730e42", "platform" : "nifi", "bulletinId" : 2200, "bulletinCategory" : "Log Message", "bulletinGroupId" : "7e7ad459-0185-1000-ffff-ffff9e0b1503", "bulletinGroupName" : "PG2-Bulletin", "bulletinGroupPath" : "NiFi Flow / Matt's PG / PG2-Bulletin", "bulletinLevel" : "DEBUG", "bulletinMessage" : "UpdateAttribute[id=8c5b3806-9c3a-155b-ba15-260075ce9a6f] Updated attributes for StandardFlowFileRecord[uuid=1b0cb23a-75d8-4493-ba82-c6ea5c7d1ce3,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1672661850924-5, container=default, section=5], offset=969194, length=1024],offset=0,name=bulletin-${nextInt()).txt,size=1024]; transferring to 'success'", "bulletinNodeId" : "e75bf99f-095c-4672-be53-bb5510b3eb5c", "bulletinSourceId" : "8c5b3806-9c3a-155b-ba15-260075ce9a6f", "bulletinSourceName" : "PG1-UpdateAttribute", "bulletinSourceType" : "PROCESSOR", "bulletinTimestamp" : "2023-01-04T20:38:27.776Z" } If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎01-03-2023

@davehkd The exception you have shared points at the following property being set to false in the nifi.properties file: nifi.cluster.protocol.is.secure=false NiFi nodes communicate with one another over HTTP when this is set to false. When set to true NiFi nodes with communicate with one another over HTTPS. Since you have this set to false, it is complaining that you do not have a your NiFi configured to with an HTTP port in the following property in the nifi.properties file: nifi.web.http.port Out of the box, Apache NiFi is configured to start securely over https as a standalone NiFi instance using the Single-User authentication and single-user-authorizer providers: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#single_user_identity_provider The intent of this provider is to provide a means for easily and quickly starting up a secure NiFi to become familiar with or evaluate NiFi. It gives that single user full access to everything and provides no mechanism for setting up and authorizing any additional users. When switching to a NiFi cluster, You'll need to setup proper authentication and authorization providers that support secure NiFi clusters. In a secured NiFi cluster setup, the NiFi nodes will need to authenticate via their certificates over a mutual TLS handshake (unless set to be unsecure as you have setup which I strongly do not recommend). This in turn means that the NiFi cluster nodes will need to have authorizations setup for proxy, data access, and controller access which the single-user-authorizer does not support. Additionally the single user identity-provider by default on NiFi startup creates a random user name and password which is going to be unique per node. This will not work in a cluster setup since actions performed on node 1 will be replicated to nodes 2 - x nodes as the authenticated user of node 1. However, nodes 2 - x will not know anything about that user and thus fail authorization. The single user authentication provider provides a mechanism for you to set a specific username and password which you could make the same on all instance of NiFi. ./bin/nifi.sh set-single-user-credentials <username> <password> My suggestion to you is to first setup a standalone NiFi securely using yoru own configuration for user authentication and user authorization: For user authentication, follow this section of the admin guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication The most commonly used method of user authentication used is the ldap-provider: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#ldap_login_identity_provider For NiFi authorizations, follow this section of the NiFi admin guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization The most basic managed setup utilizes all of the following authorization providers in below specific order in the authorizers.xml file: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileusergroupprovider https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileaccesspolicyprovider https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#standardmanagedauthorizer These are actual in template format in the default authorizers.xml included with NiFi. They are likely commented out. Once you have a a secured standalone NiFi instance working, then I would move on to setting up your NiFi cluster. You'll need to add your NiFi cluster nodes to the authorizers file-user-group-provider and file-access-policy-provider as part of that process which would require you to remove the users.xml and authorizations.xml files generated by those providers so they get recreated to support your initial cluster needed authorizations. These files are only generated by those providers if the do NOT already exist. Config changes in the providers will not trigger new or modified files. I know there is a lot to take in here, but this will set you up in the best possible way for success. If you found that this response helped with your query, please take a moment to login and select "Accept as Solution" below each response the helped you. Matt

MattWho · ‎12-21-2022

@samrathal 1. What is the purpose of the SplitJson in your dataflow? 2. If you have 1 FlowFile with 1000 records in it, why use SplitJson to split that in to 1000 FlowFiles having 1 record each? Why not just merge the larger FlowFiles with multiple records in it? Or am i missing part of the use case here? --- Can you share a template of flow definition of yoru dataflow? 1. It is not clear to me how you get "X-Total-Count" and how you are adding this FlowFile attribute to every FlowFile. 2. You have configured the "Release Signal Identifier" with a boolean NiFi Expression Language (NEL) that using your example will return "false" until "fragment.count" FlowFile attribute value equals the FlowFile attribute "X-Total-Count" value. 2a. I assume you are writing "X-Total-Count" to every FlowFile coming out of the SplitJson? How are incrementing the "fragment.count" across all FlowFile in the complete 5600 record batch. Each FlowFile that splits into 1000 FlowFiles via splitJson will have fragment.count set to 1 - 1000. So fragment.count would never reach 5600 unless you are handling this count somewhere else in your dataflow. 2b. If a FlowFile where value from "fragment.count" actually equals value from "X-Total-Count" attribute, your "Release Signal Identifier" will resolve to "true". The ""Release Signal Identifier" value (true or false) in your configuration is looked up in the configured "distributed map cache server. So where in your dataflow to you write the release signal to the distributed map cache? (usually handled by a notify processor) I am in no way implying that what you are trying to accomplish can't be done. However, coming up with an end-to-end workable solution requires knowing all the steps in the use case along the way. I would recommend going through the example Wait/Notify linked in my original response to get a better understanding of how wait and notify processors work together. Then maybe you can makes some changes to your existing dataflow implementation. With more use case details (detailed process steps) I could suggest further changes if needed. I really hope this helps you get some traction on your use case here. If you have a contract with Cloudera, you can reach out to your account owner who could help arrange for professional services that can work with your to solution your use cases in to workable NiFi dataflows. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎12-21-2022

@samrathal The "Wait" processor works in conjunction with the "Notify" processor in NiFi. See below example use case: https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/ And simply waiting until you have received all 1000 record record batches will not ensure a downstream MergeContent or MergeRecord processor will merge them all together. 1. Is this a one time execution flow? 2. if not, how do you differentiate between different complete batches (when does new one merge bundle end and another begin?)? 3. Are all 1000 records from each rest-api call going into a single NiFi FlowFile or 1 FlowFile per record? 4. Is there some correlation identifier as a rest of rest-api call that identifies all 1000 Record batch pulls as part of same complete bundle? The details of yoru use case would make it easier for the community to provide suggestions. Assuming You have some Correlation Attribute and you know that max number of records would never exceed some upper limit, you may be able to simply use a well configured MergeRecord processor using min records set higher then you would ever expect, a correlation attribute, and a max bin age (forced bin to merge after x amount of time even if min has not been satisfied) to accomplish the merging of all your records. But keep in mind the answers to questions asked play a role in whether this is possible or needs some additional consideration put in place. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎12-21-2022

@anton123 I am still not completely clear on your use, but correct me if below is not accurate: 1. You fetch a single large file. 2. That file is unpacked in to many smaller files. 3. Each of these smaller files are converted in to SQL and inserted via the putSQL processor. 4. You then have unrelated downstream processing you don't want to start until all files produced by the unpackContent processor have been successfully processed by the putSQL processor. Correct? If so, the following exampe use case for the NiFi Wait and Notify processor is probably what you are looking to implement for this use case: https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/ If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎12-21-2022

@zIfo The TLS exception "unable to find valid certification path to requested target" is telling you that there is a lack of trust in the handshake. This means that the complete trustchain needed to establish trust is missing from the truststore. This is not an issue with the NiFi InvokeHTTP processor. From command line you could try using openssl to get the public certificates for the trusts chain from the target URL. (note that not all endpoints will return complete trust chain. openssl s_client -connect <FQDN>:<port> -showcerts The server hello in response to this command will have one too many public certs. each cert will have format of below example: -----BEGIN CERTIFICATE----- MIIFYjCCBEqgAwIBAgIQd70NbNs2+RrqIQ/E8FjTDTANBgkqhkiG9w0BAQsFADBX MQswCQYDVQQGEwJCRTEZMBcGA1UEChMQR2xvYmFsU2lnbiBudi1zYTEQMA4GA1UE CxMHUm9vdCBDQTEbMBkGA1UEAxMSR2xvYmFsU2lnbiBSb290IENBMB4XDTIwMDYx OTAwMDA0MloXDTI4MDEyODAwMDA0MlowRzELMAkGA1UEBhMCVVMxIjAgBgNVBAoT GUdvb2dsZSBUcnVzdCBTZXJ2aWNlcyBMTEMxFDASBgNVBAMTC0dUUyBSb290IFIx MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAthECix7joXebO9y/lD63 ladAPKH9gvl9MgaCcfb2jH/76Nu8ai6Xl6OMS/kr9rH5zoQdsfnFl97vufKj6bwS iV6nqlKr+CMny6SxnGPb15l+8Ape62im9MZaRw1NEDPjTrETo8gYbEvs/AmQ351k KSUjB6G00j0uYODP0gmHu81I8E3CwnqIiru6z1kZ1q+PsAewnjHxgsHA3y6mbWwZ DrXYfiYaRQM9sHmklCitD38m5agI/pboPGiUU+6DOogrFZYJsuB6jC511pzrp1Zk j5ZPaK49l8KEj8C8QMALXL32h7M1bKwYUH+E4EzNktMg6TO8UpmvMrUpsyUqtEj5 cuHKZPfmghCN6J3Cioj6OGaK/GP5Afl4/Xtcd/p2h/rs37EOeZVXtL0m79YB0esW CruOC7XFxYpVq9Os6pFLKcwZpDIlTirxZUTQAs6qzkm06p98g7BAe+dDq6dso499 iYH6TKX/1Y7DzkvgtdizjkXPdsDtQCv9Uw+wp9U7DbGKogPeMa3Md+pvez7W35Ei Eua++tgy/BBjFFFy3l3WFpO9KWgz7zpm7AeKJt8T11dleCfeXkkUAKIAf5qoIbap sZWwpbkNFhHax2xIPEDgfg1azVY80ZcFuctL7TlLnMQ/0lUTbiSw1nH69MG6zO0b 9f6BQdgAmD06yK56mDcYBZUCAwEAAaOCATgwggE0MA4GA1UdDwEB/wQEAwIBhjAP BgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBTkrysmcRorSCeFL1JmLO/wiRNxPjAf BgNVHSMEGDAWgBRge2YaRQ2XyolQL30EzTSo//z9SzBgBggrBgEFBQcBAQRUMFIw JQYIKwYBBQUHMAGGGWh0dHA6Ly9vY3NwLnBraS5nb29nL2dzcjEwKQYIKwYBBQUH MAKGHWh0dHA6Ly9wa2kuZ29vZy9nc3IxL2dzcjEuY3J0MDIGA1UdHwQrMCkwJ6Al oCOGIWh0dHA6Ly9jcmwucGtpLmdvb2cvZ3NyMS9nc3IxLmNybDA7BgNVHSAENDAy MAgGBmeBDAECATAIBgZngQwBAgIwDQYLKwYBBAHWeQIFAwIwDQYLKwYBBAHWeQIF AwMwDQYJKoZIhvcNAQELBQADggEBADSkHrEoo9C0dhemMXoh6dFSPsjbdBZBiLg9 NR3t5P+T4Vxfq7vqfM/b5A3Ri1fyJm9bvhdGaJQ3b2t6yMAYN/olUazsaL+yyEn9 WprKASOshIArAoyZl+tJaox118fessmXn1hIVw41oeQa1v1vg4Fv74zPl6/AhSrw 9U5pCZEt4Wi4wStz6dTZ/CLANx8LZh1J7QJVj2fhMtfTJr9w4z30Z209fOU0iOMy +qduBmpvvYuR7hZL6Dupszfnw0Skfths18dG9ZKb59UhvmaSGZRVbNQpsg3BZlvi d0lIKO2d1xozclOzgjXPYovJJIultzkMu34qQb9Sz/yilrbCgj8= -----END CERTIFICATE----- You can copy each (including the begin and end certificate lines) and place it in different <name>.pem files which you can then import each <name>.pem in to your existing truststore. A complete trust chain consists of all the public cert from signer of hosts cert to the self signed root CA public cert. If that signer cert is self-signed (meaning owner and signer have same DN), then it is considered the root CA. If they are not the same, then another public cert exists in the chain. A complete trust chain means you have all the public certs from the one that signed the target FQDN all he way to the root CA (owner and issuer the same DN). If the output of the openssl does not contain all the public certs in the trust chain, you'll need to get the missing public certs from the source. That source could be the company hosting the server or it could be a public certificate authority (Digicert for example). You would need to go to to those sources to obtain the CA certs (often published on their website (example: https://www.digicert.com/kb/digicert-root-certificates.htm). Another option that may work for root CAs and some intermediate CAs is using java's cacerts file bundle with every java release which contains the public certs for many public authorities. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎12-20-2022

@AlexLasecki The issue here is unrelated to the copy and paste action taken. There is a bug in the code where the jsonPath cache is not cleared when the property value is changed after it has been initially set. So the same issue happens even if you do not copy and paste a splitJson processor configured with json path property value. All you need to do is change the json path value after after already having a value set. Original json path property value that is cached still gets used. The following bug jira has been created and work is already in progress to address the issue. https://issues.apache.org/jira/browse/NIFI-10998 For now as a workaround, you'll need to create a new SplitJson processor anytime you want to change the json path property value. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎12-19-2022

@AlexLasecki I can reproduce in 1.19.1 as well. Let me look more in to this issue. I'll respond again once I determine issue here. Thanks, Matt

MattWho · ‎12-19-2022

@anton123 It may be helpful if you described your use case in detail. Looking at your dataflow above, I am not clear on what you are trying to accomplish. 1. Your updateAttribute processor is changing the filename on every NiFi FlowFile that passes through it to "bu_service_template". Why are you doing this? 2. It makes no sense to me that you are looping the "original" relationship in a connection back on the MergeContent processor. All FlowFiles that go into a merged FlowFile get sent to this relationship. This means this loop would just grow and grow in size. Each Merged FlowFile that gets sent to the "merged" relationship connection would just get larger and larger. 3. Your MergeContent processor configuration is not ideal and in worst case scenario where it actually tries to merge the max configure number of Entries is likely to cause your NiFi to run out of memory since the FlowFile attributes/metadata for every FlowFile allocated to a merge bin is held in heap memory. 4. When trying to merge that many FlowFile, it is important to handle this in a series of mergeContent processors (one after another). Configure first to produce merged FlowFiles of maybe 10,000. Then another that merges yet another 10,000 and finally one last mergeContent that merges 10. The final merged FlowFiles would be 1 billion. 5. I see you are trying to use a correlation attribute in your MergeContent with attribute name "bu_service_template". Where in your dataflow is this attribute getting added to the inbound FlowFiles? 6. Keep in ind that MergeContent will execute as fats as possible and you have min entries set to 1. So it is very possible that at time of execution is sees only one new unbinned FlowFile in inbound connection and adds that to bin. Well now that bin has satisfied the min and thus would be merged with only a single FlowFile in it. So 1 FlowFile goes to merge relationship and 1 FlowFile that made up the merged FlowFile goes to "original" relationship to get merged again. A better configuration would be to set min to 10000 and max to 10000. Then you can also set a "max bin age". The max bin age is used to force a merge even if mins have not been satisfied after x configured amount of time. 3. I am not sure the role you are trying to accomplish with the wait processor after the MergeContent. I hope some of this configuration guidance helps you with yoru use case. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

Online	Offline
Last Visited	‎05-22-2026 02:33 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎05-22-2026 02:33 AM
Posts	3,470
Kudos received	1638

Cloudera Community

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: FetchSMB not fetching all files

Re: Nifi: How to revoke the import and export Temp...

Re: Errors Encountered When Installing NiFi Cluste...

Re: Nifi Processor group level logging-Issue

Re: Errors Encountered When Installing NiFi Cluste...

Re: How to apply wait processor for capture comple...

Re: How to apply wait processor for capture comple...

Re: Apache NIFI - How to wait for SQL Insert full ...

Re: InvokeHTTP - HTTPS API calls

Re: Bug when copying a SplitJson processor

Re: Bug when copying a SplitJson processor

Re: Apache NIFI - How to wait for SQL Insert full ...