About MattWho

MattWho · ‎03-11-2024

@whoknows Providing an actual CVE for the suspected detected vulnerability it always going to get you the bets response. I am assuming you may be referring to this CVE? https://www.cvedetails.com/cve/CVE-2024-22233/ Apache NiFi is not vulnerable to this CVE because NiFi does not use Spring MVC, it uses JAX-RS and Jersey for REST resources. The vulnerability is only exposed when all of the following are true: The application uses Spring MVC * Spring Security 6.1.6+ or 6.2.1+ is on the classpath ----------------- As far as upgrading directly from Apache NIFi 1.19.1 to 1.25 goes, you should have no issues there provided you have reviewed the release notes below for all version from 1.20 to 1.25 to see if any changes may impact your specific dataflows: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.25.0 I saw no red flags to worry about. ----------------- Apache NiFi also upgraded its Spring Framework version in https://issues.apache.org/jira/browse/NIFI-12811 in Apache NiFi 2.0. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-11-2024

@Ghilani 1. You are getting same exact Invalid SNI exception? 2. You are using the keystore and truststore built by Apache NiFi out-of-the-box? 3. You tried using "localhost" if Nifi is on same host and browser being used to access it? 4. If browser is on different host then NiFi, did you use the hostname instead of IP address for target host where NiFi is running? 5. Did you list the keystore used by your running NiFi to inspect the SAN entries it has setup? Thanks, Matt

MattWho · ‎03-11-2024

@TreantProtector There is a lot of ask in this one post. 1. NiFi Registry is used to store NiFi version controlled NiFi process groups (This takes user manual action to both initiate version control and push new versions to NiFi-Registry. It does not store the flow.xml.gz or flow.json.gz files that contains all the flow information NiFi loads on startup. So it is not a substitute for protecting those files on NiFi. All nodes in a NIFi cluster use the same flow.xml.gz/flow.json.gz, so it is not necessary to preserve the files from every node for recovery. 2a (NiFi) Apache NiFi stores the complete dataflow(s) on your canvas in the flow.xml.gz (legacy format) and flow.json.gz (current format). Preserving this file will preserve all your dataflows on the canvas (NOTE: all sensitive properties like passwords are encrypted in these files using the configures sensitive.props.key in NiFi, so make sure you save that password or you will need to scrub these files of all enc{...} values to load it. removing values woudl require you to re-enter all encrypted values in the NiFi components) Apache NiFi has a local state directory configured. This is unique to each node and stores state information for processors that store local state. Should be preserved to avoid data duplication. Apache NiFi content_repository(s) - Holds active (content claims still used by actively queued FlowFiles within your dataflows) and archived content claims (archive subdirectories holding archived claims which are not being referenced by any active FlowFiles in the UI). This repository is tightly coupled to the flowfile_repository. Content_repository(s) hold unique per node claims and need to be protected on all nodes to avoid data loss. Apache NiFi flowfile_repository - Contains metadata/attributes (to include reference to content claim in content_repository(s) along with byte offset and length). Tightly coupled to content_repository(s) on same node so make sure same flowfile_repository is loaded with corresponding content_repository(s) from same node. This must be protected to avoid data loss. Apache provenance_repository - Holds event data about FlowFile transactions and are unique per node. Loss of these is a loss or provenance history, but would not cause loss of any queued FlowFiles. These are typicallly also placed on protected storage Apache metadata_repository - Metadata about users who authenticated to NiFi and flow configuration history when using embedded H2 DB. Not necessary to retain unless you want to preserve that historical information. NiFi extension directory contains any custom NiFi nars to have added to your NiFi. Copies of yoru custom nars should be preserved somewhere to prevent losing them to they can restored easily should it be needed. Apache NiFi local authorization files like users.xml and authorizations.xml which contain the users and their associated authorizations granted over time through the NiFi UI should be preserved or you'll need to set those back up again in recovery (same on all nodes) Node specific configured local directories used in your dataflows (dataflows built on canvas). Some components may allow you configure local directories for persistent directory storage. If you are using these they should be persisted. Example: DistributedMapCacheServer 1.25.0 2b. NiFi-Registry NiFi-Registry database which contains all information about version controlled flows and buckets should be protected unless you are using an external DB which you are protecting by other means. default uses an embedded H2 DB. NiFi-Registry extensions directory if being used to store version controlled extensions (jars) NiFi-Registry persistence provider stores the actual version controlled NiFi process groups and is tightly coupled to the NiFi-Registry database. If using external GitFlowPersistence provider, refer to git for for persistence requirements. NiFi-Registry bundle persistence has local and S3 options and protected storage should be used if using local NiFi-Regsitry local authorization files like users.xml and authorizations.xml which contain the users and their associated authorizations granted over time through the NiFi-Registry UI should be preserved or you'll need to set those back up again in recovery. Reference material: https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#backup-recovery 3. covered in above - refer to Apache NiFi nifi.properties file for your configured local storage paths. 4. yes - covered above 5a. Not sure I follow the question. On restoration NiFi or NiFi will read the persistence provider (whether they are local, git, or S3) preserving the NiFi and NiFi-Registry conf directory configuration files would make restoration easier. While the NiFi content_repository(s) and flowfile_repository are tightly coupled to one another on the same node and tie back to the flow.xml.gz/flow.json.gz (same all nodes) content. which node they get restored to does not matter (specific node information is not present in any of those). NOTE: content_repositories are directly correlated to the content_repository property name in the nifi.properties file. nifi.content.repository.directory.default=/dir1/node1 nifi.content.repository.directory.repo2=/dir2/node1 Upon restoration content_repository contents persisted for /dir1/node1 must still be set in "defualt" and not set to different property name. This is because the flowfile metadata in the corresponding flowfile_repository does not contain directory details. It simply says you can find content for FlowFile xyz in nifi.content.repository.directory.default at sub-directory (num), content claim, byte offset, and num bytes. So if you put dir2 in the default content_repository you'll mess up finding your content. 6. Zookeeper is used to store cluster state used by a good number of NiFi processors (refer to individual processor documentation for state information. For every processor documentation. there is a "state management" section that tells you if the specific processor component stores state and if that state is local or cluster). State is stored for a specifc component For cluster state stored in zookeeper it is not node specific state as all components that use cluster state utilize same state information. Failing to protect against loss of state info typically leads to data duplication, but all depends on how a given processor is using that state information. Example: ListSFTP 1.25.0. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-07-2024

@sukanta This depends on what version of Apache NiFi is being used. In Apache NiFi 1.12 or newer, there exists the following property in the nifi.properties file for excluding the server version in HTTP responses: nifi.web.should.send.server.version=<true or false> The default is true when not configured. This capability was added as part of https://issues.apache.org/jira/browse/NIFI-7321 It is best to ask unrelated question in different community questions. Asking multiple question makes it hard for other in the community to understand what question was addressed by the "accepted" solution. Keep in mind that Apache NiFi is an open source product, so anyone can look at the source code to see what Jetty version(s) are being used. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-06-2024

@hegdemahendra The Service Unavailable response to a request received by the HandleHTTPRequest processor is most commonly the result of the FlowFile produced by the HandleHTTPRequest processor not being processed by a downstream HandleHTTPResponse processor before the configured Response Expiration configured in the StandardHTTPContextMap controller service. This aligns with the shared exception you shared from the NiFi logs. If you are seeing this exception prior to the 3 minutes expiration you set, it is possible your client is closing the connection due to some client timeout. That you could need to look at your client sending the requests to get details and options. You mentioned you have 4200+ processors that are scheduled based on their individual configurations. When a processor is scheduled it requests a thread from the configured Maximum TimerDriven Thread Count pool of threads. So you can see that not all processor can execute concurrently which is expected. You also have only 8 cores so assuming hyper-threading you are looking at the ability to actually service only 16 thread concurrently. So what you have happing is time slicing where all your up to 200 concurrently scheduled threads are gets bits of time on the CPU cores. Good to see you looked at your core load average which is very important as it helps you determine what is a workable size for your thread pool. If you have a lot of cpu intensive processor executing often, your CPU load average is going to be high. For you I see a good managed CPU usage with some occasional spikes. I brought up above as it directly relates to your processor scheduling. The HandleHTTPRequest processor creates a web server that accepts inbound requests. These request will stack-up within that web service as the processor executed threads read those and produce a FlowFile for each request. How fast this can happen depends on available threads and concurrent task configuration on HandleHTTPRequest processor scheduling tab. By default an added processor only has 1 concurrent task configured. If you set this to say 5, then the processor could potentially get allocated up to 5 threads to process request received by the HandleHTTPRequest processor. Thought here is you might also be seeing service unavailable because the container queue is filling faster then the processor is producing the FlowFiles as another possibility. Hope this information helps you in your investigation and solution for you issue. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-05-2024

Correct. A FlowFile might over its dataflow lifetime point at different content claims for its content. That all depends on the processors used in the dataflow. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-04-2024

@Chetan_mn NiFi's DistributedMapCacheServer controller service has existed in Apache NiFi since the 0.x releases when Apache NiFi offered no HA at all. In the Apache NiFi 0.x releases NiFi had a dedicated NiFi Cluster Manager (NCM) and NiFi cluster nodes that all reported to the NCM. The only way to access the NiFi UI was via the NCM server. The DistributedMapCacheServer controller service at that time only ran on the NCM and not on any of the cluster nodes. The DistributedMapCacheClient controller service ran on all the nodes so that all cluster nodes could read and write to the same cache server. Fast forward to Apache NiFi 1.x+ releases where Apache NiFi eliminated the NCM with a new zero master clustering ability. This provide HA at the NiFI control layer so that users could access the NiFi cluster via ANY connected node's URL. Since there was not dedicated NCM anymore controller services like DistributedMapCacheServer when added now start a DistributedMapCache server on each node independently of one another. The multiple cache servers do NOT communicate or share cache entries with one another. So you effectively have a single point of failure with this cache provider. If the node which your DistibributedMapCacheClient is configured for goes down, you have an outage until it is recovered. Apache NiFi offers better options that do offer a true distributedMapCache capability now like HBase, Redis, Hazelcast, Redis, etc. These utilize an externally installed map cache service that would offer better fault tolerance through HA, but adds an additional service dependency. Now if you still choose to use the DistributedMapCacheServer, keep in mind that all cached entries will be held in NiFi's heap memory. So the large the cache entry is and the larger the number of cache entries held, the more NiFi heap that will be consumed. The DistributedMapCacheServer has an optional configuration for "Persistence Directory". When configured, the cache entries will be persisted to disk on the location configured. The amount of space required again depends on cache entry size and number of possible entries to retain. Keep in mind that configuring this persistence directory does NOT remove cache entries from NiFi's heap memory. It simply persist a copy of the cache entries to disk so that should the NiFi be restarted, NiFi can reload the cache from disk in to heap memory. If no persistence directory is configured, NiFi going down would result in a loss of all cache entries. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-04-2024

@MvZ The "file-login-provider" login identity-provider has never existed in any out-of-the-box release of Apache NiFi. If you have created or downloaded some custom implementation of this provider. You would need to consult with that author in getting it to work. Where did you obtain this provider from and what process did you follow to add it to your NiFi installation? The exception you have shared simply tells you that during startup NiFi is loading the nifi.properties file and the property "nifi.security.user.login.identity.provider" is configured with "file-login-provider"; however, when NiFi parsed the login-identity-providers.xml configuration file, no provider with: <identifier>file-login-provider</identifier> was found in that configuration file. I can't provide any guidance on this provider as I was unable to find anything online about what I am expecting is a custom add-on provider. The out-of-the-box available authentication providers are found in the NiFi documentation here: Apache NiFi 1.2x versions: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication Apache NiFi 2.x versions: https://nifi.apache.org/documentation/nifi-2.0.0-M1/html/administration-guide.html#user_authentication NiFi Authentication and Authorization are two different configurations and independent configurations. Once you have chosen how you want to handle user authentication, you then move on to setting up user authorization: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization. For file based authorization, NiFi offers two providers: 1. Older deprecated FileAuthorizer 2. The current StandardManagedAuthorizer These providers are configured in the NiFi authorizers.xml file. No direct useer policies get defined in the authorizers.xml file. The FileAuthorizer or the FileAccessPolicyProvider referenced by the StandardManagedAuthorizer will generate the initial authorizations.xml file with the initial admin user configured in the provider chosen. You would not typically manually generate or manipulate this file. Instead you would acces your NiFi's UI using that initial admin and define additional user authorizations directly via the NiFi UI. Here is an example of what you would have in your authorizers.xml if using the StandardManagedAuthorizer: <authorizers> <userGroupProvider> <identifier>file-user-group-provider</identifier> <class>org.apache.nifi.authorization.FileUserGroupProvider</class> <property name="Users File">./conf/users.xml</property> <property name="Legacy Authorized Users File"></property> <property name="Initial User Identity 1">ronald</property> </userGroupProvider> <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">file-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">ronald</property> <property name="Legacy Authorized Users File"></property> <property name="Node Identity 1"></property> </accessPolicyProvider> <authorizer> <identifier>managed-authorizer</identifier> <class>org.apache.nifi.authorization.StandardManagedAuthorizer</class> <property name="Access Policy Provider">file-access-policy-provider</property> </authorizer> </authorizers> If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-04-2024

@saquibsk Unfortunately, the exception "java.lang.ClassCastException: null" is not very helpful here making it very difficult to make any suggestions on where the issue within the data resides. You might want to try putting the putDataBaseRecord processor logging in DEBUG within NiFi's logback.xml to see if it happens to produce more output that might be useful. org.apache.nifi.processors.standard.PutDatabaseRecord It is also a good idea to provide the exact version of Apache NIFi or CFM you are using as it is also useful when asking about issue in the community. It allows those assisting to narrow down the scope of where to look for known issues. Thanks, Matt

MattWho · ‎03-04-2024

@manishg A NiFi FlowFile consists of two parts: 1. FlowFile Content - FlowFile Content is stored within a content claim inside the NiFi content_repository. Once content is written to a claim it can not be modified. 2. FlowFile Attributes/Metadata - FlowFile Attributes/Metadata is stored within the flowfile_repository. FlowFile attributes/metadata contain various key/value pairs about the FlowFile (these may be attributes/metadata created by NiFi on all FlowFiles like filename, date, location of content with content_repository, etc. or Attributes added later via NiFi processors.) The FlowFile attributes can be modified. When a FlowFile is created, its content is written to content claim. Within a NiFi data flow you may have processors that modify the content of a FlowFile. Depending on the processor, the modification of the content can result in two outcomes (both of which result in new content being written to a new content claim.). 1. The processor writes the new content to a new content claim and the FlowFile attributes/metadata is updated to reference that new claim going forward. 2. If the processor has an "original" relationship, the original FlowFile is sent to this relationship while any produced new FlowFiles derived from that original are created and routed to another outbound relationship. It is also possible within your dataflow that you may have a single FlowFile that you duplicate, such as routing the same success outbound relationship twice. Anytime a FlowFile is duplicated, NiFi creates a clone of the FlowFile. Both FlowFiles are unique; however, both are l pointing at the same content claim in the content_repository. NiFi tracks claimant counts on all content_repository content. For every FlowFile pointing at a content claim, the claimant count is incremented. As each FlowFile pointing at a content claim reach a point of termination in the dataflow, the claimant count is decremented. Only content claims for which the claimant count is zero can be archived and eventually purged from the content_repository. Hope this helps clarify the lifecycle of a NiFi FlowFile. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎01-13-2026 05:07 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-13-2026 05:07 AM
Posts	3,421
Kudos received	1620

Cloudera Community

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: Spring Framework Open Redirect Vulnerability o...

Re: Access failed after apache nifi installation

Re: NiFi + Registry Backup/Restoration

Re: How to create custom error page for apache nif...

Re: Nifi is responding SERVICE_UNAVAILABLE when a...

Re: Flowfile immutability

Re: Where do all cache entries for DistributedMapC...

Re: Setting User Login for my Apache NiFi

Re: Error: java.lang.ClassCastException: null - Pu...

Re: Flowfile immutability