About MattWho

MattWho · ‎08-01-2022

@hegdemahendra This could be and IOPS issue possibly, but it could also be a concurrency issue with threads. How large is your Timer Driven thread pool? This is the pool of threads from which the scheduled components can use. If it is set to 10 and and all are currently in use by components, the HandleHTTPRequest processor , while scheduled, will be waiting for a free thread from that pool before it can execute. Adjusting the "Max Timer Driven Thread pool" requires careful consideration of average CPU load average across on every node in your NiFi cluster since same value is applied to each node separately. General starting pool size should be 2 to 4 times the number of cores on a single node. Form there you monitor CPU load average across all nodes and use the one with the highest CPU load average to determine if you can add more threads to that pool. If you have a single node that is always has a much higher CPU load average, you should take a closer look at that server. Does it have other service running on it tat are not running on other nodes? Does it unproportionately consistently have more FlowFiles then any other node (This typically is a result of dataflow design and not handling FlowFile load balancing redistribution optimally.)? How many concurrent tasks on your HandleHttpRequest processor. The concurrent tasks are responsible for obtaining threads (1 per concurrent task if available) to read data from the Container queue and create the FlowFiles. Perhaps the request come in so fast that there are not enough available threads to keep the container queue from filling and thus blocking new requests. Assuming your CPU load average is not too high, increase your Max Timer Driven Thread pool and the number fo concurrent tasks on your HandleHttpRequest processor to see if that resolves your issue. But keep in mind that even if this helps with processor getting more threads, if the disk I/O can't keep up then you will still have same issue. As far as having all your NiFi repos on same disk, this is not a recommended practice. Typical setup would have the content_repository on its own disk (content repo can fill disk to 100% which does not cause issue other then not being able to write new content until disk usage drops), The provenance_repository on its own disk (size of this disk depends on amount of provenance history you want to retain and size fo your dataflows along with volume of FlowFiles, but its disk usage is controllable. Recommend separate disk due to disk I/O), and put the database_repository (very small in terms of disk usage) and flowfile_repository (relatively small unless you allow a very large number fo FlowFiles to queue in your dataflows. FlowFile_repos only hold metadata/attributes about your queued FlowFIles, but can also be I/O intensive on disk) together on a third disk. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-01-2022

@AbhishekSingh 1. @araujo response is 100% correct. 2. Just to add to @araujo respsonse here... NiFi-Registry has nothing to do with controlling what user can and can't do on the NiFi canvas. It simply allows users if it is installed to version control process groups. Even once a NiFi process group has been version controlled, authorized users in NiFi can still make changes to dataflows (even those that re version controlled). One they do make a change to a version controlled Process Group, that process group will indicate that a local change has been made and the authorized user will have the option to commit that local change as a new version of the dataflow. Controlling what users can do with dataflows is handled via authorization policies which NiFi handled very granularly. Authenticated users can be restricted to only specific Process Groups. Your NiFi admin user can setup NiFi authorization for other user per Process Group if they want by selecting the Process Group and clicking on the "key" icon in the "operate panel" on the left side of the NiFi canvas. If you found any of the responses provided assisted with your query, please take a moment to login and click on "Accept as Solution" below each of those posts. Thank you, Matt

MattWho · ‎08-01-2022

@hegdemahendra 1. Do you see any logging related to the content_repository. Perhaps something related to NiFi not allowing writes to the content repository waiting on archive clean-up? 2. Is any outbound connection from the handleHTTPRequest processor red at the time of the pause? This indicates backpressure is being applied which would stop the source processor from being scheduled until back pressure ends. 3. How large is your Timer Driven thread pool? This is the pool of threads from which the scheduled components can use. If it is set to 10 and and all are currently in use by components, the HandleHTTPRequest processor , while scheduled, will be waiting for a free thread from that pool before it can execute. Adjusting the "Max Timer Driven Thread pool" requires careful consideration of average CPU load average across on every node in your NiFi cluster since same value is applied to each node separately. General starting pool size should be 2 to 4 times the number of cores on a single node. Form there you monitor CPU load average across all nodes and use the one with the highest CPU load average to determine if you can add more threads to that pool. If you have a single node that is always has a much higher CPU load average, you should take a closer look at that server. Does it have other service running on it tat are not running on other nodes? Does it unproportionately consistently have more FlowFiles then any other node (This typically is a result of dataflow design and not handling FlowFile load balancing redistribution optimally.)? 4. How many concurrent tasks on your HandleHttpRequest processor. The concurrent tasks are responsible for obtaining threads (1 per concurrent task if available) to read data from the Container queue and create the FlowFiles. Perhaps the request come in so fast that there are not enough available threads to keep the container queue from filling and thus blocking new requests. Hope the above helps you get to the root of your issue. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-29-2022

@hegdemahendra How many FlowFiles are queued on the outbound connection(s) from your HandleHttpRequest processor? Is backpressure being applied on the HandleHTTPRequest processor? What version of NiFi are you using? Any logging in the app.log about not being allowed to write to content repository and waiting on archive cleanup? If NiFi is blocking on on creating new content claims in to the content_repository, the HandleHTTPRequest processor will not be able to take data from the container and generate the outbound FlowFile. This would explain why cleaning up those repos would reduce the disk usage below the blocking threshold. There are some know issues around NiFi blocking even if archive sub-directories in the content_repository are empty which were addressed in the latest Apache NiFi 1.16 release or Cloudera's CFM 2.1.4.1000 release. You may also want to look at your content repository settings for: Compare those to your disk usage where your content_repo is located. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-29-2022

@PradNiFi1236 There are numerous steps in this process. So lets start with some basics on Site-To-Site (S2S). The S2SBulletinReportingTask works much like the NiFi Remote Process Group (RPG). When configured with a Destination URL and enabled a background thread will run independently on an interval to fetch S2S details from the destination URL. If the destination url is a node in a NiFi cluster, the returned S2S details will include the hostnames of all the nodes in the cluster, whether cluster nodes are configured to support RAW and/or HTTP transport protocols, the configured RAW port for each node, node load average, etc... Configuring just one destination URL from the target cluster does not change this behavior. Configuring a comma separate list of nodes from the same destination cluster affords you HA. If S2S details can't be retrieved from node URL 1, it then tries second URL, and so forth. Also keep in mind it does not matter what node URL of a NiFi cluster you are accessing, any component (processor, reporting task, controller service, etc) added to the canvas is replicated to all nodes in the NiFi cluster. So when you enable this S2SBulletinReporting task, all nodes are going to try to fetch S2S details. Each node in a NIFi cluster has all the same components and executes all the same components (with exception of processors that can be scheduled to execute on primary node only). This means that all node will be trying to send generated bulletins to your cluster nodes. So by what you shared, it looks like the background thread that fetches those S2S details is failing due to timeout. This could be for any number of reasons. Your configured keystore in the sslContextService does not contain a single PrivateKeyEntry that can be trusted by the truststore configured in the nifi.properties file on all 3 of your destination nodes. The PrivateKeyEntry presented by the 3 NiFi nodes to the controller service is not trusted by what exists in the truststore configured in your SSLConTextService. keystore used in sslContextService does not have a clientAuth PrivtaeKeyEntry in it. nifi.remote.input.secure is not set to true. nifi.remote.input.http.enabled not set to true. There are several authorization policies in play here as well, but I don't think you have even gotten that far yet. Retrieve site-to-site details <-- The privateKeyEntry from the keystore configured in the sslContextService will need to be authorized to retrieve these S2S details. The keystore used in each of your 3 NiFi node's sslContext service may have unique DNs for their PrivateKeyEntry. So all three of those unique keys would need to be authorized Receive data via site-to-site <-- The privateKeyEntry from the keystore configured in the sslContextService will need to be authorized to retrieve these S2S details. The keystore used in each of your 3 NiFi node's sslContext service may have unique DNs for their PrivateKeyEntry. So all three of those unique keys would need to be authorized. This allows the S2SBulletinReporting task to see this bulletinmonitoring remote input port as an option to send bulletins to. But if you were getting past authentication and failing on authorization, your exception would be different. Instead of timeouts, you would be seeing not authorized exceptions. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-18-2022

@Alevc Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target The above exception you are encountering with TLS is caused by a lack of a complete trust chain in the mutual TLS handshake. On each side (server and client) of your TLS connection, you will have a keystore containing a PrivateKey entry (Will support and extended key usage (EKU) of clientAuth, serverAuth, or both) that either your client or server will use to identify itself. That PrivateKey entry will have an owner and issuer DN associated with it. The issuer is the signer for the owner. Each side will also have a truststore (just another keystore by a different name containing a bunch of TrustedCertEntry(s)) that would need to contain the trustedCertEntry for the issuer/signer of your PrivateKeyEntry. It is also very common that the issuer/signer trustedCertEntry has an owner DN and Issuer DN that do not match. This means that that issuer was just an intermediate Certificate Authority (CA) and was issued/signed by another CA. As such the truststore would need to also contain the TrustedCertEntry for that next level issuer CA. This continues until you reach the root CA trustedCertEntry where the owner and issuer have the same DN. This is known as the rootCA for your PriavteKeyEntry. Having all the intermediate CA(s) and the root CA, means you have the complete trust chain in your truststore. This process applies in both directions in the mutual TSL handshake. Meaning your clientAuth certificate presented by your Kafka Consumer must have its complete trust chain in the Kafka servers truststore. And the ServerAuth certificate presented by your server must have its complete trust chain present in the truststore used by your client Kafka consumer. Note: I am over simplifying this mutual TLS handshake (private keys themselves are never shared and there is more in the server and client hello exchanges in the TLS handshake), but intent is to focus at a high level on what your issue is caused by specifically. So to get past your issue, you need to make sure the truststore used by your client and server side contain all the CAs trust chain trustedCertEntries. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-14-2022

@sayak17 so this looks doable as I described above. Not sure what program you are using in your screenshots as it does not show the full rest-api call structure. But your process would be as as follows: 1. Get bearer token 2. Extract bearer token from response body and place in FlowFile attribute 3. Format flowfile attribute as needed for use by the rest of the api-calls 4. Using bearer token make get request to the ../sites rest-api endpoint 5. Use ExtractText to extract the site id to a new FlowFile Attribute 6. Use bearer token again and use NiFi Expression Language to dynamically create the rest-api url to include the site id. 7. From response, extract the Drive id to another new FlowFile Attribute as above. 8. Use bearer token again and use NiFi Expression Language to dynamically create the rest-api url to include the drive id. 9 finally do whatever you need to do with the response json you get from that final rest-api call. Matt

MattWho · ‎07-14-2022

@rafy Once you have created the certificates for your other two users as @DigitalPlumber suggested, you would need to connect to your NiFi as the admin user you setup during initial securing of your NiFi and these two new users via the NiFi global menu (upper right corner) --> Users. Then you would need to authorize those new user identities against any policies needed for them to perform the actions you want them to be allowed. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#config-users-access-policies Hope this helps, Matt

MattWho · ‎07-14-2022

@Meeran Sharing the full exception that was output when you tried to import key maybe helpful, But I am also not sure why you are trying to add a Client private key to your AWS load-balancer? I am not well versed on AWS load balancer configuration, so helping you get past your LB setup/configuration issue may be difficult for me. I do know TLS very well and explained the meaning of the two TLS exceptions you encountered. The HTTPS LB exception could be resolved by making sure the TrustedCertEntry(s) for the complete trust chain for the HTTPS LB's privateKey are present in the NiFi-Registry's truststore.jks. You should also add the trustedcertEntry for the NiFi CA you appeared to be using to the truststore used by your HTTPS LB. Thank you, Matt

MattWho · ‎07-14-2022

@Meeran The users.xml is created/managed by the File-User-Group-Provider in the Authorizers.xml file. The authorizations.xml is created/managed by the File-Access-Policy-provider in the Authoirizers.xml file. Neither of these providers support using a database for persisting the users, groups, and authorizations. For more information on the authorization providers, follow the below link: https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#authorization NiFi-Registry supports using an embedded H2 DB (default) or an external DB (Postgres or MySQL) for storing knowledge of which buckets exist, which versioned items belong to which buckets, as well as the version history for each item. The actual version controlled flows are not stored in the DB https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#metadata-database The actual Version Controlled dataflow data is stored via the configured Persistence provider. The default provider is FileSystemFlowPersistenceProvider which writes this data to a directory locally on the NiFi-Registry host. The other available option is the GitFlowPersistenceProvider which commits this data to a remote Git repository when configured correctly. For more detail follow below link: https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#persistence-providers If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	2,922
Kudos received	1446

Cloudera Community

Re: Invoke Http with url containing %2F

Re: Role of primary node

Re: Querying Data Provenance using FlowFile Attrib...

Re: Authentication and authorization methods in ap...

Re: Nifi PutSFTP failed to rename dot file when "O...

Re: HandleHttpRequest processor is throwing "conta...

Re: Is it mandatory to install NiFi Toolkit along ...

Re: HandleHttpRequest processor stops receiving re...

Re: HandleHttpRequest processor is throwing "conta...

Re: Problem in Establishing Site2Site Reporting ta...

Re: Nifi Kafka Confluent - SSL handshake failed

Re: How to plug-in value from a JSON response of i...

Re: Multi user configuration in Nifi, how to?

Re: NiFi Registry with AWS Load balancer Issue

Re: How to NiFi Registry Users.xml and Authorizati...