Member since
07-30-2019
2906
Posts
1442
Kudos Received
844
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
53 | 04-17-2024 11:30 AM | |
58 | 04-16-2024 05:36 AM | |
36 | 04-15-2024 05:31 AM | |
119 | 04-03-2024 05:59 AM | |
132 | 04-02-2024 01:22 PM |
03-27-2024
07:17 AM
@2ir I doubt it is related to Java version being used. Of course we would always recommend using the latest update version for a NiFi supported Java version. As far as using G1GC, it is commented out because Java 8 has many issues when using G1GC and the Java community decided to address those bugs and improvements in Java 9+ versions. Since you are using Java 11, G1GC would be a better option. With that lin commented out, NiFi does not specify and GC for yoru Java and whatever the Default GC defined within your java release would be used. That line allows you to override yoru java default and specify the GC you want to use. Memory issues are often attributed to issues in custom components added to the NiFi deployment or dataflow design choices. Hence all the dataflow related input i provided previously. You never mentioned if you were encountering any out of memory (OOM) error logs in your NiFi logs? If not, do you see any OOMs if you decrease your Heap memory setting which you have set rather high already. Also recommend setting both xms and xmx to same value. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-27-2024
06:58 AM
1 Kudo
@jpalmer We'll need some more details to help here: 1. Is this a standalone single NiFi instance or a NiFi multi-instance cluster setup? 2. How many partitions on your source NiFi Kafka topic? 3. How do you have your MergeContent processor configured? 4. When you say connection quickly fills up, what are the settings on the connection? 5. With your flow running and processing FlowFiles through the dataflow connections, what is the CPU load average. You can find these details from within NiFi's UI from either the cluster UI under global menu in upper right corner or the system diagnostics Ui found in the controller UI also under the global menu. 6. Do you have a lot of other dataflows also running within this same NiFi? MergeContent can be CPU and Heap memory intensive depending on its configuration There are likely ways to improve your dataflow once we know the above details. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-27-2024
06:47 AM
@s198 The FetchHDFS is by default designed to be used in conjunction with the ListHDFS processor. The ListHDFS processor is designed to connect to HDFS and generate a NiFi FlowFile for each file listed from HDFS without getting the content of that HDFS file. The produced 0 byte FlowFiles contain FlowFile attributes that are then used by the FetchHDFS processor to obtain the actual content and insert it into the FlowFile's content. NiFi has numerous list/fetch sets of processors. They were designed for sources that are not NiFi cluster friendly (meaning that the client does not support a distributed fetch capability that would not result in data duplication). So in a NiFi cluster the List<abc> processor would get configured to run on the NiFi cluster primary node only so that only one node in the NiFI cluster would get the metadata about all the source files to be ingested by the NiFi clusyter. The List<abc> processor would then be connected via a Nifi connection to the Fetch<abc> processor. The connection between these two processor would be configured to load balance the 0 byte FlowFiles across all nodes in the NiFi cluster. Then the Fetch<abc> processor could run on all nodes. Since each node in the cluster has a subset of the listed files, there is no duplication and the load/work is now distributed across the NiFi cluster. If you are not using a NiFi cluster and only a standalone single NiFi instance, you could use the GetHDFS processor instead. But if you plan to ever expand to a NIFi cluster it is best to build your dataflows now with that in mind to avoid extra work later. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-25-2024
12:55 PM
@saquibsk Additional settings? With a Secured NiFi (which you should always be using) there is authentication and authorization involved with any rest-api request. The simplest approach is to generate a clientAuth certificate that is trusted via the truststore your secured NiFi is configured to use in nifi.properties file. Then that certificate is added to a keystore. The invokeHTTP processor can be configured to use a StandardRestrictedSSLContextService that you configure with the keystore you created and the truststore that NiFi already uses that can trust that certificate. On NiFi side, you would need to add that client as a user entity so you can assign authorization policies to. You can then authorize that client/user identity to the policies needed to start and stop specific processor components. That policy would be the "operate the component" policy that you can set just on the QueryDatabaseTable processor or any other specific processor you want to automate. component-level-access-policies Yes, there are some initial steps to setup the keystore and truststores needed, but then those can be used over and over for all automation within NiFi you want to achieve. NiFi processors execute based on the individual processor's configured scheduling. There is no other option to stop or start individual processors except manually through the UI by an authorized user of via the rest api. NiFi was designed with an always running type architecture in mind. Stepping our of that architecture would require extra steps or redesigning your dataflows to operate within that architecture style. If your executions always happen at set times you could use cron scheduling, but that is not going to be an optimal design for performance. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-25-2024
07:08 AM
@2ir Heap usage continues to grow until utilization exceeds ~80% at which time heap Garbage Collection (GC) is executed to release unused heap memory allocations. Even if there are no FlowFiles or running flows in NiFi, if the GC threshold has not been reached, the heap would not be cleaned out. First question is why is our NiFi heap set so high already? Where you encountering OutOfMemory (OOM) issues with smaller heap settings. You'll get the best performance using the smallest workable heap values. Typical NiFi heap size recommendations in most scenarios is XMS and XMX of 16 GB or less. You stated that consumption was stable for 3 weeks and then changed: What changed about the content being ingested and created (size, number, etc)? Where any new dataflows added or recently enabled? What processor types are being used in your dataflows? Are you leaving any FlowFiles queued up in dataflow connections? Are you writing large attributes to FlowFiles? Are you extracting content from FlowFiles into attributes? Aside from that you would need to look at hep dumps to see what is consuming the majority of the space to maybe identify the components responsible. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-25-2024
06:47 AM
@saquibsk A couple thoughts come to mind here... Have you looked at maybe using a GenerateTableFetch processor in "Dim 2" which can be triggered by an incoming FlowFile. This processor will take an optional inbound connection as a trigger. Other option might be to use an invokeHTTP processor after your PutDatabaseRecord processor to start the the "Dim 2" QueryDatabaseTable processor via the NiFi rest-api (REST API). You could then do similar after DiM 2 QueryDatabaseTable processor to stop the processor again. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-22-2024
06:21 AM
1 Kudo
@Chetankc From a NiFi perspective there is not much guidance that can be given with such little information. What does "10 Billion Load" mean? Is it the number if unique files being ingested to NiFi? What is size average? What is rate of ingest? What is "15,000 process"? Is this the number of NiFi processors added to the NiFi canvas? What types of processors are being used? Does your dataflow(s) do a lot of content modification? Have you done testing on throughput performance and done any performance tuning? 15,000 processors is a lot of execution scheduling against your CPU cores. In your load testing what was you CPU load average? What was your memory impact? You also have custom NiFi components. Are you referring to these custom components as using many threads or the totality of the 15,000 components using a lot of threads? What does a lot of threads mean here? Are any of these long running threads or are they all millisecond thread executions? What kind of performance and throughput are you achieving now? and onn what type of setup (how many nodes in your NiFi cluster, number of CPU cores, JVM Heap settings, type of disk, etc) currently? Thank you, Matt
... View more
03-22-2024
06:08 AM
1 Kudo
@hidden Welcome to the world of Apache NiFi. The first recommendation I'd make is to download the latest version of Apache NiFi 1.x branch. The 1.12 branch is more then 5 years old now and there have been so many improvements, bug fixes, and security updates since its release. The new Apache NiFi 2.x branch has also been released recently. Since you are new to NiFi you may also consider utilizing the 2.x version instead to avoid hassle down the road of migrating to this new major release branch. The 1.x branch will cease to release new versions soon. When sharing exceptions for help is is best to make sure you have also inspected the NiFi-Registry logs produced in the log directory you have configured in the logback.xml file. They may provide more detailed stack traces and/or logging to help fully understand the issue you encountered. Thank you, Matt
... View more
03-21-2024
12:57 PM
1 Kudo
@2ir NiFi can consume heap (native) and non-heap (non native) memory. This commonly happens with processor that create jetty servers for listening for inbound requests, scripting processors that execute child scripts, processors that execute OS commands, NiFi bootstrap parent process, etc... So XMX heap memory is only part of the memory that can be used by a Java application. From within the NiFi UI --> global menu --> cluster or summary, you can see that actual amount of heap utilized using (JVM tab cluster UI or summary system diagnostics UI). I would advise against setting your Heap so high when you have 47 GB total memory. It is likely that your OS if linux based is going to invoke OOM killer to kill the NiFi process to protect the OS. I'd advise reducing your heap usage to XMS and XMX of 24 GB. Re-evaluating your dataflows for high memory use processor and making sure they are optimally configured is next steps. The embedded documentation for the the processor components will each have a "System Resource Considerations" section that will tell you if the processor has the potential to use high memory or high cpu. For processors with potential for high heap usage, be careful with concurrent task configuration. Default on current tasks is always 1. Increasing it is like adding a second copy of the processor allowing multiple concurrent executions thus increasing the heap usage. (Example: ReplaceText 1.25.0) You'll want to be careful using templates (deprecated) as any templates generated and held in NiFI consume heap. FlowFile metadata is held in heap so avoid creating FlowFile with large attributes (like extracting content to attributes). Use Record base processor whenever possible to reduce numbers of individual FlowFiles. Use a NiFi cluster instead of standalone NiFi to spread FlowFile load across multiple NiFi instances. Monitor heap usage and and collect heap dumps to analyze what is consuming the heap. Hope this helps you with your investigative journey. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-20-2024
12:52 PM
1 Kudo
@darkcoffeelake NiFi out-of-the-box setup generates simply keystore and truststore automatically and set the login provider to single-user-provider and authorizer to single-user-authorizer. This out-of-the-box setup is simplifies secured access for evaluation of NiFi. It is not a production ready setup in that it does not support multi-user authentication, granular access controls, or NiFi cluster setups. There are bunch of steps that go into securing Apache NiFi for production ready environments. Securing NiFi not only sets up NiFi over an HTTPS connection, but also requires that user authentication and authorization is setup. NiFi will require a keystore and truststore which youcan create yourself or use publicly available service to create them for you (example would be tinycert). The keystore created for you NiFi must meet the following requirements for NiFi: Contains only 1 PrivateKey entry. Does not use wildcards in the DN of PrivateKey certificate. Has both clientAuth and serverAuth Extended key Usage (EKU) Has SubjectAlternativeNames (SAN) entry(s) matching NiFi hostname and any other name that may be used to access the NiFi. The truststore needs to contain the complete trust chain for your NiFi keystore certficate. A certifcate might be self signed (meaning both issuer and signer are same DN), it may be signed by an intermediate CA, or rootCA. If signed by an intermeidiate CA, your truststore would need to have the trustedCertEntry (public key) for the intermediate CA (intermediate CA is any CA where signer and issuer are different DNs) and the trusted certEntry for that signer and so until you reach the root CA in the chain (root CA will have same signer and issuer DN). Once you have your certificates, you'll need to decide how your users are going to authenticate with NiFi. NiFi does not have a embedded provider that supports multi-user authentication. Here is what is available to choose from: User Authentication LDAP and Kerberos are probably the most commonly used. Once you have decided how you are going to authenticate your users, you'll need to setup authorization for those users. here are your options here: Multi-Tenant Authorization The simplest authorizers.xml setup would utilize the StandardManagedAuthorizer, FileAccessPolicyProvider, and FileUserGroupProvider. a sample configuration can be seen here: https://nifi.apache.org/documentation/nifi-2.0.0-M1/html/administration-guide.html#file-based-ldap-authentication If setup correctly, on first startup, the above authorizers.xml will generate and seed the users.xml and authorizations.xml file so that your initial admin user (a ldap user or kerberos user for example) with the necessary authorization policies to access the NiFi UI. From the NiFi UI, that initial admin user can setup additional user identity authorizations. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more