Member since
07-30-2019
2455
Posts
1285
Kudos Received
692
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
160 | 02-02-2023 12:32 PM | |
44 | 02-02-2023 12:06 PM | |
273 | 01-17-2023 12:55 PM | |
105 | 01-12-2023 01:30 PM | |
116 | 01-12-2023 01:08 PM |
01-03-2023
07:13 AM
@davehkd The exception you have shared points at the following property being set to false in the nifi.properties file: nifi.cluster.protocol.is.secure=false NiFi nodes communicate with one another over HTTP when this is set to false. When set to true NiFi nodes with communicate with one another over HTTPS. Since you have this set to false, it is complaining that you do not have a your NiFi configured to with an HTTP port in the following property in the nifi.properties file: nifi.web.http.port Out of the box, Apache NiFi is configured to start securely over https as a standalone NiFi instance using the Single-User authentication and single-user-authorizer providers: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#single_user_identity_provider The intent of this provider is to provide a means for easily and quickly starting up a secure NiFi to become familiar with or evaluate NiFi. It gives that single user full access to everything and provides no mechanism for setting up and authorizing any additional users. When switching to a NiFi cluster, You'll need to setup proper authentication and authorization providers that support secure NiFi clusters. In a secured NiFi cluster setup, the NiFi nodes will need to authenticate via their certificates over a mutual TLS handshake (unless set to be unsecure as you have setup which I strongly do not recommend). This in turn means that the NiFi cluster nodes will need to have authorizations setup for proxy, data access, and controller access which the single-user-authorizer does not support. Additionally the single user identity-provider by default on NiFi startup creates a random user name and password which is going to be unique per node. This will not work in a cluster setup since actions performed on node 1 will be replicated to nodes 2 - x nodes as the authenticated user of node 1. However, nodes 2 - x will not know anything about that user and thus fail authorization. The single user authentication provider provides a mechanism for you to set a specific username and password which you could make the same on all instance of NiFi. ./bin/nifi.sh set-single-user-credentials <username> <password> My suggestion to you is to first setup a standalone NiFi securely using yoru own configuration for user authentication and user authorization: For user authentication, follow this section of the admin guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication The most commonly used method of user authentication used is the ldap-provider: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#ldap_login_identity_provider For NiFi authorizations, follow this section of the NiFi admin guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization The most basic managed setup utilizes all of the following authorization providers in below specific order in the authorizers.xml file: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileusergroupprovider https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileaccesspolicyprovider https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#standardmanagedauthorizer These are actual in template format in the default authorizers.xml included with NiFi. They are likely commented out. Once you have a a secured standalone NiFi instance working, then I would move on to setting up your NiFi cluster. You'll need to add your NiFi cluster nodes to the authorizers file-user-group-provider and file-access-policy-provider as part of that process which would require you to remove the users.xml and authorizations.xml files generated by those providers so they get recreated to support your initial cluster needed authorizations. These files are only generated by those providers if the do NOT already exist. Config changes in the providers will not trigger new or modified files. I know there is a lot to take in here, but this will set you up in the best possible way for success. If you found that this response helped with your query, please take a moment to login and select "Accept as Solution" below each response the helped you. Matt
... View more
01-03-2023
06:03 AM
@Mostafa12345678 I'd recommend providing more detail in your query here to include things like the version of NiFi being used, the NiFi processor component being used and its configuration, and the full stack trace error from the nifi.app.log. Thanks, Matt
... View more
12-21-2022
12:25 PM
@samrathal 1. What is the purpose of the SplitJson in your dataflow? 2. If you have 1 FlowFile with 1000 records in it, why use SplitJson to split that in to 1000 FlowFiles having 1 record each? Why not just merge the larger FlowFiles with multiple records in it? Or am i missing part of the use case here? --- Can you share a template of flow definition of yoru dataflow? 1. It is not clear to me how you get " X-Total-Count" and how you are adding this FlowFile attribute to every FlowFile. 2. You have configured the "Release Signal Identifier" with a boolean NiFi Expression Language (NEL) that using your example will return "false" until "fragment.count" FlowFile attribute value equals the FlowFile attribute " X-Total-Count" value. 2a. I assume you are writing "X-Total-Count" to every FlowFile coming out of the SplitJson? How are incrementing the "fragment.count" across all FlowFile in the complete 5600 record batch. Each FlowFile that splits into 1000 FlowFiles via splitJson will have fragment.count set to 1 - 1000. So fragment.count would never reach 5600 unless you are handling this count somewhere else in your dataflow. 2b. If a FlowFile where value from "fragment.count" actually equals value from " X-Total-Count" attribute, your "Release Signal Identifier" will resolve to "true". The ""Release Signal Identifier" value (true or false) in your configuration is looked up in the configured "distributed map cache server. So where in your dataflow to you write the release signal to the distributed map cache? (usually handled by a notify processor) I am in no way implying that what you are trying to accomplish can't be done. However, coming up with an end-to-end workable solution requires knowing all the steps in the use case along the way. I would recommend going through the example Wait/Notify linked in my original response to get a better understanding of how wait and notify processors work together. Then maybe you can makes some changes to your existing dataflow implementation. With more use case details (detailed process steps) I could suggest further changes if needed. I really hope this helps you get some traction on your use case here. If you have a contract with Cloudera, you can reach out to your account owner who could help arrange for professional services that can work with your to solution your use cases in to workable NiFi dataflows. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-21-2022
06:33 AM
@samrathal The "Wait" processor works in conjunction with the "Notify" processor in NiFi. See below example use case: https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/ And simply waiting until you have received all 1000 record record batches will not ensure a downstream MergeContent or MergeRecord processor will merge them all together. 1. Is this a one time execution flow? 2. if not, how do you differentiate between different complete batches (when does new one merge bundle end and another begin?)? 3. Are all 1000 records from each rest-api call going into a single NiFi FlowFile or 1 FlowFile per record? 4. Is there some correlation identifier as a rest of rest-api call that identifies all 1000 Record batch pulls as part of same complete bundle? The details of yoru use case would make it easier for the community to provide suggestions. Assuming You have some Correlation Attribute and you know that max number of records would never exceed some upper limit, you may be able to simply use a well configured MergeRecord processor using min records set higher then you would ever expect, a correlation attribute, and a max bin age (forced bin to merge after x amount of time even if min has not been satisfied) to accomplish the merging of all your records. But keep in mind the answers to questions asked play a role in whether this is possible or needs some additional consideration put in place. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-21-2022
06:18 AM
@anton123 I am still not completely clear on your use, but correct me if below is not accurate: 1. You fetch a single large file. 2. That file is unpacked in to many smaller files. 3. Each of these smaller files are converted in to SQL and inserted via the putSQL processor. 4. You then have unrelated downstream processing you don't want to start until all files produced by the unpackContent processor have been successfully processed by the putSQL processor. Correct? If so, the following exampe use case for the NiFi Wait and Notify processor is probably what you are looking to implement for this use case: https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/ If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-21-2022
06:00 AM
@zIfo The TLS exception "unable to find valid certification path to requested target" is telling you that there is a lack of trust in the handshake. This means that the complete trustchain needed to establish trust is missing from the truststore. This is not an issue with the NiFi InvokeHTTP processor. From command line you could try using openssl to get the public certificates for the trusts chain from the target URL. (note that not all endpoints will return complete trust chain. openssl s_client -connect <FQDN>:<port> -showcerts The server hello in response to this command will have one too many public certs. each cert will have format of below example: -----BEGIN CERTIFICATE-----
MIIFYjCCBEqgAwIBAgIQd70NbNs2+RrqIQ/E8FjTDTANBgkqhkiG9w0BAQsFADBX
MQswCQYDVQQGEwJCRTEZMBcGA1UEChMQR2xvYmFsU2lnbiBudi1zYTEQMA4GA1UE
CxMHUm9vdCBDQTEbMBkGA1UEAxMSR2xvYmFsU2lnbiBSb290IENBMB4XDTIwMDYx
OTAwMDA0MloXDTI4MDEyODAwMDA0MlowRzELMAkGA1UEBhMCVVMxIjAgBgNVBAoT
GUdvb2dsZSBUcnVzdCBTZXJ2aWNlcyBMTEMxFDASBgNVBAMTC0dUUyBSb290IFIx
MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAthECix7joXebO9y/lD63
ladAPKH9gvl9MgaCcfb2jH/76Nu8ai6Xl6OMS/kr9rH5zoQdsfnFl97vufKj6bwS
iV6nqlKr+CMny6SxnGPb15l+8Ape62im9MZaRw1NEDPjTrETo8gYbEvs/AmQ351k
KSUjB6G00j0uYODP0gmHu81I8E3CwnqIiru6z1kZ1q+PsAewnjHxgsHA3y6mbWwZ
DrXYfiYaRQM9sHmklCitD38m5agI/pboPGiUU+6DOogrFZYJsuB6jC511pzrp1Zk
j5ZPaK49l8KEj8C8QMALXL32h7M1bKwYUH+E4EzNktMg6TO8UpmvMrUpsyUqtEj5
cuHKZPfmghCN6J3Cioj6OGaK/GP5Afl4/Xtcd/p2h/rs37EOeZVXtL0m79YB0esW
CruOC7XFxYpVq9Os6pFLKcwZpDIlTirxZUTQAs6qzkm06p98g7BAe+dDq6dso499
iYH6TKX/1Y7DzkvgtdizjkXPdsDtQCv9Uw+wp9U7DbGKogPeMa3Md+pvez7W35Ei
Eua++tgy/BBjFFFy3l3WFpO9KWgz7zpm7AeKJt8T11dleCfeXkkUAKIAf5qoIbap
sZWwpbkNFhHax2xIPEDgfg1azVY80ZcFuctL7TlLnMQ/0lUTbiSw1nH69MG6zO0b
9f6BQdgAmD06yK56mDcYBZUCAwEAAaOCATgwggE0MA4GA1UdDwEB/wQEAwIBhjAP
BgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBTkrysmcRorSCeFL1JmLO/wiRNxPjAf
BgNVHSMEGDAWgBRge2YaRQ2XyolQL30EzTSo//z9SzBgBggrBgEFBQcBAQRUMFIw
JQYIKwYBBQUHMAGGGWh0dHA6Ly9vY3NwLnBraS5nb29nL2dzcjEwKQYIKwYBBQUH
MAKGHWh0dHA6Ly9wa2kuZ29vZy9nc3IxL2dzcjEuY3J0MDIGA1UdHwQrMCkwJ6Al
oCOGIWh0dHA6Ly9jcmwucGtpLmdvb2cvZ3NyMS9nc3IxLmNybDA7BgNVHSAENDAy
MAgGBmeBDAECATAIBgZngQwBAgIwDQYLKwYBBAHWeQIFAwIwDQYLKwYBBAHWeQIF
AwMwDQYJKoZIhvcNAQELBQADggEBADSkHrEoo9C0dhemMXoh6dFSPsjbdBZBiLg9
NR3t5P+T4Vxfq7vqfM/b5A3Ri1fyJm9bvhdGaJQ3b2t6yMAYN/olUazsaL+yyEn9
WprKASOshIArAoyZl+tJaox118fessmXn1hIVw41oeQa1v1vg4Fv74zPl6/AhSrw
9U5pCZEt4Wi4wStz6dTZ/CLANx8LZh1J7QJVj2fhMtfTJr9w4z30Z209fOU0iOMy
+qduBmpvvYuR7hZL6Dupszfnw0Skfths18dG9ZKb59UhvmaSGZRVbNQpsg3BZlvi
d0lIKO2d1xozclOzgjXPYovJJIultzkMu34qQb9Sz/yilrbCgj8=
-----END CERTIFICATE----- You can copy each (including the begin and end certificate lines) and place it in different <name>.pem files which you can then import each <name>.pem in to your existing truststore. A complete trust chain consists of all the public cert from signer of hosts cert to the self signed root CA public cert. If that signer cert is self-signed (meaning owner and signer have same DN), then it is considered the root CA. If they are not the same, then another public cert exists in the chain. A complete trust chain means you have all the public certs from the one that signed the target FQDN all he way to the root CA (owner and issuer the same DN). If the output of the openssl does not contain all the public certs in the trust chain, you'll need to get the missing public certs from the source. That source could be the company hosting the server or it could be a public certificate authority (Digicert for example). You would need to go to to those sources to obtain the CA certs (often published on their website (example: https://www.digicert.com/kb/digicert-root-certificates.htm). Another option that may work for root CAs and some intermediate CAs is using java's cacerts file bundle with every java release which contains the public certs for many public authorities. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-20-2022
12:06 PM
@AlexLasecki The issue here is unrelated to the copy and paste action taken. There is a bug in the code where the jsonPath cache is not cleared when the property value is changed after it has been initially set. So the same issue happens even if you do not copy and paste a splitJson processor configured with json path property value. All you need to do is change the json path value after after already having a value set. Original json path property value that is cached still gets used. The following bug jira has been created and work is already in progress to address the issue. https://issues.apache.org/jira/browse/NIFI-10998 For now as a workaround, you'll need to create a new SplitJson processor anytime you want to change the json path property value. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-19-2022
11:19 AM
@AlexLasecki I can reproduce in 1.19.1 as well. Let me look more in to this issue. I'll respond again once I determine issue here. Thanks, Matt
... View more
12-19-2022
10:20 AM
@anton123 It may be helpful if you described your use case in detail. Looking at your dataflow above, I am not clear on what you are trying to accomplish. 1. Your updateAttribute processor is changing the filename on every NiFi FlowFile that passes through it to "bu_service_template". Why are you doing this? 2. It makes no sense to me that you are looping the "original" relationship in a connection back on the MergeContent processor. All FlowFiles that go into a merged FlowFile get sent to this relationship. This means this loop would just grow and grow in size. Each Merged FlowFile that gets sent to the "merged" relationship connection would just get larger and larger. 3. Your MergeContent processor configuration is not ideal and in worst case scenario where it actually tries to merge the max configure number of Entries is likely to cause your NiFi to run out of memory since the FlowFile attributes/metadata for every FlowFile allocated to a merge bin is held in heap memory. 4. When trying to merge that many FlowFile, it is important to handle this in a series of mergeContent processors (one after another). Configure first to produce merged FlowFiles of maybe 10,000. Then another that merges yet another 10,000 and finally one last mergeContent that merges 10. The final merged FlowFiles would be 1 billion. 5. I see you are trying to use a correlation attribute in your MergeContent with attribute name "bu_service_template". Where in your dataflow is this attribute getting added to the inbound FlowFiles? 6. Keep in ind that MergeContent will execute as fats as possible and you have min entries set to 1. So it is very possible that at time of execution is sees only one new unbinned FlowFile in inbound connection and adds that to bin. Well now that bin has satisfied the min and thus would be merged with only a single FlowFile in it. So 1 FlowFile goes to merge relationship and 1 FlowFile that made up the merged FlowFile goes to "original" relationship to get merged again. A better configuration would be to set min to 10000 and max to 10000. Then you can also set a "max bin age". The max bin age is used to force a merge even if mins have not been satisfied after x configured amount of time. 3. I am not sure the role you are trying to accomplish with the wait processor after the MergeContent. I hope some of this configuration guidance helps you with yoru use case. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-16-2022
11:07 AM
@PradNiFi1236 NiFi is designed to be data agnostic. So content that NiFi ingested is preserved in binary format wrapped in a NIFi FlowFile. It then becomes the responsibility of an individual processor that needs to operate against the content to understand the content type. The mergeContent processor does not care about the content type. This processor numerous merge strategies: - Binary concatenation simply writes the binary data from one FlowFile to the end of the binary data from another. There is no specific handling based on the content type. So combining two PDF files in this manor is going to leave you with unreadable data which explains why even with the larger content type of the merged FlowFile, you still only see the first PDF in the merged binary. - Tar and zip combines multiple pieces of content in to a single tar file. You can then later untar or unzip this to extract the multiple separate pieces of content it contains. So would preserve both independent PDF files. - FlowFile stream is unique to NiFi and merges multiple NiFi FLowFiles (A FlowFile consist of content and FlowFile metadata. This strategy is only used to preserve that NiFi metadata with the content for future access by another NiFi. - Avro expects the content being merged is already of Avro format. This will properly merge Avro type data in to single new Avro content. So the question here is first, how would you accomplish the merging of two PDF outside of NiFi. Then investigate how to accomplish the same within NiFi, if possible. TAR and ZIP will work to get you one output file; however, if your use case is to produce 1 new PDF form 2 original PDFs, mergeContent is not going to do that. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-16-2022
08:53 AM
@Mosoa Before upgrading Apache NiFi you should read through the migration guide to include all releases between your current release and the release you are upgrading to. https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance The Hive nar was removed as of Apache NiFi 1.17.0, so I am guessing your previous version was at least 1.16 or older. As far as downloading and adding additional nars to NiFi, it is very easy. 1. Go to https://search.maven.org/ in your browser. 2. Search for Apache "NiFi Hive nar" 3. A list of artifacts will be shown, you'll need to click on all those you need, but I would start with "nifi-hive-nar" and "nifi-hive-services-api.nar" by clicking on the version number below each. 4. From the nar specific page you will see a "Downloads" option in the upper right corner of the page: 5. When you click on it three option appear. Select "nar". 6. Place the downloaded nar files in to the "lib" directory of your NiFi 1.19.1 installation. You'll notice that this directory already contains nar for other component classes already included with the base download. 7. Make sure ownership and permissions on these new nar files match other nars in the "lib" directory. 8. Start your NiFi 1.19.1 Now you will see the Hive components available in your NiFi UI: If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-13-2022
07:07 AM
@sathish3389 Routing based on a sensitive value is an unusual use case. I'd love to hear more about this use case. Ultimately the RouteOnAttribute processor expects a boolean NiFi Expression Language Statement. So you want to have a sensitive parameter value that is evaluated against something else (another attribute on the inbound FlowFile) and if true route to a new relationship. Is what you are comparing this sensitive parameter value against also sensitive? If so, how are you protecting it as Attributes on FlowFiles are not sensitive and stored in plaintext. The ability to use Sensitive Parameters in dynamic properties (non password specific component properties) was added via https://issues.apache.org/jira/browse/NIFI-9957 in Apache NiFi 1.17.0. While this change created the foundation for such dynamic Property support for sensitive parameters, individual components need to be updated to utilize this new capability. As you can imagine with well over 300+ components available to NiFi, this is a huge undertaking. So what i see in the apache community are changes based on specific use case requests. I'd recommend creating an Apache NiFi Jira detailing your use case and working with the Apache Community to adopt that use case change to the RouteOnAttribute processor to support dynamic property support for Sensitive parameters. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-13-2022
06:35 AM
@MaarufB You must have a lot of logging enabled that you expect multiple 10MB app.log files per minute. Was NiFi ever rolling files? Check your NiFi app.log for any Out of Memory (OOM) exceptions. Does not matter what class is throwing the OOM(s), once the NiFi process is having memory issues, it impacts everything within that service. If this is the case, you'll need to make changes to your dataflow(s) or increase the NiFi heap memory. Secondly, check to make sure you have sufficient file handles for your NiFi process user. For example; - If your NiFi service is owned by the "nifi" user, make sure the open file limit is set to a very large value for this user (999999). A restart of the NiFi service before the change to file handles will be applied. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-12-2022
06:01 AM
@Onkar_Gagre The Max Timer Driven Thread pool setting is applied to each node individually. NiFi nodes configured as a cluster ate expected to be running on same hardware configuration. The guidance of 2 to 4 times the number of cores as starting point is based on the cores of a single node in the cluster and not based off the cumulative cores across all NiFi cluster nodes. You can only reduce wait time as you reduce load on the CPU. In most cases, threads given out to most NiFi processors execute for only milliseconds. But some processors operating against the content can take several seconds or much longer depending on function of the processor and/or size of the content. When the CPU is saturated these threads will take even longer to complete as the CPU is giving time to each active thread. Knowing the only 8 threads at a time per node can actually execute concurrently, a thread only gets a short duration of time before giving some to another. The pauses in between are the CPU wait time as thread queued up wait for their turns to execute. So reducing the max Timer Driven Thread count (requires restart to reduction to be applied) would reduce maximum threads sent to CPU concurrently which would reduce CPU wait time. Of course the means less concurrency in NiFi. Sometimes you can reduce CPU through different flow designs, which is a much bigger discussion than can be handle efficiently via the community forums. Other times, your dataflow simply needs more CPU to handle the volumes and rates you are looking to achieve. CPU and Disk I/O are the biggest causes of slowed data processing. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-09-2022
02:00 PM
@F_Amini @Green_ Is absolutely correct here. You should be careful when increasing concurrent tasks as just blindly increasing it everywhere can have the opposite effect on throughput. I recommend stopping setting the concurrent tasks back to 1 or maybe 2 on all the processor where you have adjusted away from the default of 1 concurrent task. Then take a look at the processor further downstream in your dataflow where it has a red input connection but black (no backpressure) outbound connections. This processor s @Green_ mentioned is the processor causing all your upstream backlog. Your'll want to monitor your CPU usage as you make small incremental adjustments to this processors concurrent tasks until you see the upstream backlog start to come down. If while monitoring CPU, you see it spike pretty consistently at 100% usage across all your cores, then your dataflow has pretty much reached the max throughput it can handle for yoru specific dataflow design. At this point you need to look at other options like setting up a NiFi cluster where this work load can be spread across multiple servers or designing your datafow differently with different processors to accomplish same use case that may have a lesser impact on CPU (not always a possibility). Thanks, Matt
... View more
12-09-2022
01:28 PM
@Onkar_Gagre Let's take a look at concurrent task here.... You have a an 8 core machine. You have a ConsumeKafka configured with 8 concurrent tasks and 4 nodes. I hope this means your Kafka topic has 32 partitions because that processor creates a consumer group with the 8 consumers from each node as part of that consumer group. Kafka will only assign one consumer from a consumer group to 1 partition. So having more consumer then partitions gains you nothin, but can cause performance issues caused by rebalance. Then you have a QueryRecord with 40 Concurrent tasks per node. Each allocated thread across your entire Dataflow needs time on the CPU. So just between these two processor alone, you are scheduling up to 48 concurrent threads that must be handled by only 8 cores. Based on your description of data volume, it sounds like a lot of CPU wait when enable this processor as each thread is only get a fraction of time on the CPU and thus taking long to complete its task. Sounds like you need more Cores to handle your dataflow and not necessarily an issue specific to the use of the QueryRecord processor. While you maybe scheduling concurrent tasks too high for your system on the QueryRecord processor, The scheduled thread come from the Max Timer Driven Thread pool set in yoru NiFi. The default is 10 and I assume you increased this higher to accommodate the concurrent tasks you have been assigning to your individual processors. The general starting recommendation for the Max Timer Driven Thread pool setting is 2 to 4 Times the number of cores on your node. So with an 8 core machine that recommendation would be 16 - 32. The decision/ability to set that even higher is all about your dataflow behavior along with your data volumes. It requires you to monitor cpu usage ad adjust the pool size in small increments. Once CPU is maxed there is nothing much we can do with create more CPU. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-09-2022
01:09 PM
@Techie123 Can you provide more detail around your requirement for "t he FFs order is also important". My initial thought here would be a two phase merge. In the first Merge you utilize a correlation FlowFile attribute you create on each FlowFile based off the employees ID extracted from the record. Setting min number of entries to 7 and max to 10. Then you take these employee merged records and merge them together in to larger FlowFiles using MergeRecord. The question is if 100 records per FlowFile is a hard limit or not which it does not. The MergeRecord processor Max number of records is soft limit. Let's assume we set this to 100. So lets say one of your merged employee records comes to the MergeRecord and has 7 records in it for that employee ID, yet the bin already has 98 records in it. Since bin min has not been met yet, this merged FlowFile still gets added and results in merged FlowFile with 105 records. If you must keep it under 100 records per FlowFile set the max records to 94. If at end of adding a set of merged employee records it is less than 94 another merge employee record would be added and since you stated each set of merged employee records could be up to 7, this keeps you below or at 100 in that single merged record. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-06-2022
12:48 PM
@Onkar_Gagre 1. What is the CPU and Memory usage of your NiFi instances when the QueryRecord processor is stopped? 2. How is your QueryRecord processor configured to include scheduling and concurrent task configurations? What other processors were introduced as part of this new dataflow? 3. What does disk I/O look like while this processor is running? NiFi documentation does not mention any CPU or Memory specific resource considerations when using this processor. Thanks, Matt
... View more
12-05-2022
11:33 AM
1 Kudo
@Ghilani NiFi stores Templates in the flow.xml.gz file. the flow.xml.gz is just a compressed copy of dataflow(s) which reside inside NiFi's heap memory while NiFi is running. It is not recommended to keep templates in your NiFi. NiFi templates are also deprecated and will go away in next major release. It is recommended to use NiFi-registry to store version controlled flows. If not using NiFi-Registry, Flow definitions should be downloaded instead of creating templates and stored safely somewhere outside of NiFi itself. A flow definition can be downloaded by right clicking on a process group in NiFi and selecting "Download flow definition". This json file will be generated of that flow and downloaded. Flow definitions can be uploaded to NiFi by dragging the create Process Group icon to the canvas and selecting option to upload flow definition. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
11:19 AM
1 Kudo
@dreaminz You can create variables on a process group, those variables are then only available to that process group (scope) on which they were created. NiFi documentation on Variables: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Variables Variable shave been deprecated in favor of Parameter Contexts: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#parameter-contexts You can create a single parameter context that you add parameters to and then associate the parameter context with multiple process groups. This will allow you to update a parameter in one parameter context and effectively update your flows in multiple process groups. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
11:09 AM
@grb Your QueryDatabaseTable processor is failing because the dependent controller service is not yet enabled. It appears that controller services is still trying to enable (enabling) because the SQLServerDriver you have configured in that controller service is not compatible with the Java JDK version you are using to run NiFi. What version of NiFi are you using? What version Java is your NiFi using? I recommend updating your Java version to the most recent version of Java JDK 8 or Java JDK 11 (Version 11 only supported in NiFi versions 1.10+). Otherwise, you'll need to find an older version of your SQ driver. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
10:56 AM
@ajignacio That was a big jump from version 1.9.x to 1.16.x of NiFi. NiFi's data provenance stores, for a configurable amount of time, information about NiFi FlowFiles as they traverse the various processors in your dataflow(s). Over the releases of NiFi both improvements and new implementations of provenance have been introduced. The original version of provenance was org.apache.nifi.provenance.PersistentProvenanceRepository which has since been deprecated in favor of a better performing provider class org.apache.nifi.provenance.WriteAheadProvenanceRepository which is the new default. The following properties from the nifi.properties file are used to configure the provenance repository: nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
nifi.provenance.repository.directory.default=./provenance_repository
nifi.provenance.repository.max.storage.time=30 days
nifi.provenance.repository.max.storage.size=10 GB. (use to be 1 GB)
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID
nifi.provenance.repository.indexed.attributes=
nifi.provenance.repository.index.shard.size=100 MB
nifi.provenance.repository.max.attribute.length=65536
nifi.provenance.repository.concurrent.merge.threads=2
nifi.provenance.repository.warm.cache.frequency= For details on these properties, here is Apache NiFi documentation section: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository The good news is that data provenance retention has no direct relationship to the active FlowFiles traversing your dataflow(s) currently. This means that you can shutdown your NiFi, purge the contents of the current <path to>/provenance_repository directory, adjust the configuration properties as you want, and then restart your NiFi. NiFi will build a new provenance repository on startup. Considering that NiFi only provides limited configurable space (1GB original default to 10GB current default) and age (30 days) as the defaults, you would not be losing much if you were to reset. I am also concerned that the path in the error suggests you also created your original provenance_repository within a subdirectory of the FlowFile_repository which I would not recommend. I would strongly suggest not writing the contents of any one of the four NiFi repositories within each other. Considering the flowfile_repository and content_repository are the two most important repositories for tracking your actively being processed FlowFiles in your dataflow(s), I suggest these each be on their own path and reside on dedicated disk backed by RAID to avoid data loss in the event of a disk failure. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
10:25 AM
@Sinchan You'll want to inspect the configuration of the the following properties in the nifi.properties configuration file: When you configure a secure NiFi configuration, these properties must be configured. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
09:16 AM
@Vinylal You can download the Cloudera Manager installer from the following Cloudera page: https://www.cloudera.com/downloads/cdp-private-cloud.html You'll need Cloudera username and password in order to access downloads from Cloudera. If you account with Cloudera and don't know your credentials, you can reach out to your Cloudera account representative. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
09:09 AM
@Jacccs Was the nifi-app.log created? If so, what was observed in that log? Please share the exact versions of NiFi you used. Have you tried using a Java JDK instead of Java JRE? Thanks, Matt
... View more
12-05-2022
08:46 AM
@cihankara I recommend starting a new community question as this sounds like a different use case from this community post. Providing details of yoru use case like what is other output, how are those being ingested in to NiFi, what info is available in that output, etc.. Feel free to @MattWho in that new community post to trigger notification to me in the community. Thanks, Matt
... View more
12-05-2022
08:41 AM
@hargav Please create a new community question for your queries around MergeRecord processor. This is the best way to get attention and best for community to have a separate thread for each specific query. I am not clear on your use case for using "cron driven" scheduling with the MergeRecord. This would not be a common thing to do. Best to explain your use case in a new community thread along with sharing your MergeRecord processor configuration. Feel free to @MattWho in the new community post to notify me. Thanks, Matt
... View more
11-28-2022
12:25 PM
@hargav NiFi processor scheduled every day at 11:40AM: 0 40 11 * * ? * NiFi processor Scheduled every day at 12:00PM: 0 0 12 * * ? * If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
11-28-2022
12:01 PM
@Mohamed_Shaaban I recommend starting a new community question with the details specific to your setup. This allows the community to address/assist with your specific setup versus comparing your issue to what was shared in this post. Thanks, Matt
... View more
11-22-2022
01:28 PM
2 Kudos
@drewski7 New processors are created within the community all the time and the documentation for processors should include resource considerations for CPU usage and Memory usage. Just because a processor list memory as a resource consideration, that impact is often a byproduct of how that processor has been configured. you can refer to the imbedded documentation in your installed NiFi instance or you can right click on a processor added to the canvas and select "view usage" from the displayed context menu to go directly to that components embedded documentation page. But processors like ReplaceText, SplitText, SplitContent, SplitJson,... would be examples. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more