About MattWho

MattWho · ‎08-08-2022

@VJ_0082 When a NiFi processor errors it will produce a bulletin. NiFi has a SiteToSiteBulletin reporting task. This reporting task can send these produced bulletins to a remote input port on either the same NiFi or a completely different NiFi. You can construct a dataflow from that remote input port to route these FlowFiles to your putEmail processor. Constructing a dataflow like this avoids needing to deal with non error related log output or ingesting same log lines over and over by re-reading the app.log. By default processors have their bulletin level set to ERROR, but you can change that on each processor if so desired. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-05-2022

@nk20 If you are running a standalone NiFi, state is stored via the configured local state provider. If the node crashes you don't lose that state. NiFi will load that local state when it is restarted. Only way you would lose state is if server was unrecoverable (but you have also lost your currently queued data, your entire flow, etc... You can and certainly should have your NiFi's repos, state directory, and conf directory located on RAID disks to protect against loss in event of disk failure. A better option is to setup a NiFi cluster. Processors like GenerateTableFetch will then use cluster state which is stored in Zookeeper (ZK) (recommend setting up an external 3 node ZK Cluster rather then using NiFi's embedded ZK). There are many advantages to using a NiFi cluster rather than a standalone single NiFi instance beyond just having state stored in ZK. 1. Distributed processing across multiple server 2. Externally stored cluster state 3. Avoid complete flow outage in event of a node failure. 4. All nodes execute exact same flow and thus each have a copy of it. In a NiFi cluster you would start your dataflow with your GenerateTableFetch processor configured to execute on "Primary node" only. Within a NiFi cluster one node will be elected to be the "primary node". The success relationship connection would then be configured to load balance the generated FlowFiles containing your SQL statements. This would allow all nodes in your cluster to concurrently execute those SQL statements in your downstream processors which are configured to execute on al nodes. If the currently elected primary node should crash, a new primary node will be elected. When that happens the processor configured for "primary node" only execution will retrieve that last state written to ZK and pickup processing where old node left off. Off the top of my head nothing comes to mind in terms of being able to solve your use case in a stateless manor. However, maybe others in the community have some thoughts here. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-05-2022

@chitrarthasur Can you share the complete error and any stack trace following it in the nifi-app.log and/or nifi-user.log when you try to delete the process group? Thanks, Matt

MattWho · ‎08-05-2022

@VJ_0082 Your use case is not very clear. What exactly are you trying to accomplish via your existing dataflow and what issues are you having? Thanks, Matt

MattWho · ‎08-04-2022

@code Have you considered using GenerateTableFetch, QueryDatabaseTable, or QueryDatabaseTableRecord that generates SQL that you then feed to the ExecuteSQL to avoid getting old and new entries with each execution of your existing flow? Avoiding ingesting duplicate entries is better then trying to find duplicate entries across multiple FlowFiles. You can detect duplicates within a single FlowFile using DeduplicateRecord; however, this requires all records are merged in to a single FlowFile. You can use DetectDuplicate; however, this requires that each FlowFile contains one entry to compare. Using these methods add a lot of additional processing in your dataflows or holding of records longer then you want in your flow and this probably not the best/most ideal solution. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-04-2022

@mhsyed The latest Cloudera Runtime version can be found here (latest at top of list): https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/release-guide/topics/cdpdc-runtime-download-information.html So latest version is CDH-7.1.7-1.cdh7.1.7.p1000.24102687 (CDP 7.1.7 Service Pack 1). If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-03-2022

@uzi1990 Can you provide more detail around the type of testing you are referring to? Testing what specifically? NiFi is a flow based programming ETL tool. As a user you add and configure components (processors, RPGs, Remote ports, funnels, etc..) to the NiFi canvas. Then interconnecting those added components via connections containing component relationships. Processor components (currently in excess of 300 unique processor available) can be started and stopped one by one or on groups. When a component executed in generates or passes a FlowFile to a downstream relationship. Via the NiFi UI, users can list the contents of a downstream connection and view/download the content of the FlowFile for inspection and also view any metadata/attributes NiFi has set for those FlowFiles. This is how you would validate the processor configuration produced the expected output you want. You can then start the next processor component in your dataflows and do the same processor over. Assuming you have content repository archive enabled, you can also execute an entire flow and examine the generated data provenance for any FlowFile(s) that traversed that dataflow. You can see the content and meatadata/attributes as they existed at each generated provenance event. Example Data Provenance lineage: You can right click on any event dot and view the details: If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-03-2022

@Angelvillar I think you have numerous unrelated questions here. The "GitFlowPersistenceProvider" allows you to configure a git repo in which your version controlled process groups can be pushed for persistent storage outside of NiFi Registry server's file system. What is most important here is that NiFi only reads from the git repo on service startup. While running everything is local to the NiFi-Registry server. So if some changes are manually made on the got repo, the NiFi -Registry will not see them. Additionally, the metadata about those stored versions flows is stored in the NiFi-Registry metadata-database and not in the got repo. Also keep in mind that if you created flows originally using the local file based flow provider and then switched to git repo provider, those flows will not get moved to git. Only new flows get created in git and old flows no longer are reachable. 1. Which Flow Persistence provider is configured for use in the NiFi-Registry has nothing to do with NiFi being able to connect and import flows. NiFi connects to the NiFi-Registry client URL configured in NiFi and gets a list of bucket flows to which the NiFi user has authorized access. That flow information comes from the NiFi-Registry metadata DB. So when you mad a change to the git repo, that would have had no affect until a NiFi-Registry restart. What is in the new repo also would have had no affect on what is in the NiFi-Registry Metadata DB. My guess here is that NiFi was give a list of version controlled flows known to the NiFi-Registry via the metadata DB and then when you tried to import one of them, NiFi-Registry could not find the actual flow locally. Review the "Switching from other Flow Persistence Provider" section under the metadata-database section in the NiFi-Registry docs. What changes did you make in configs when you cloned the git repo to tell NiFi to start using the new cloned repo over the original repo? If you configured git repository has existing flows committed to it, if you have nothing in the metadata-database, NiFi-registry will generate metadata for the flows imported from the flow persistence provider on NiFi-Registry startup. NiFi or NiFi-Registry being secured has nothing to do with the error you described. If NiFi was able to display a list of flows for selection to import, then connectivity to Registry seems fine. However, keep in mind that if you secure Registry, you must secure NiFi in order to write to any buckets. A secured NiFi can access a non secured NiFi-Registry and a non secured NiFi can access a non secured NiFi-Registry. It is also possible for a non secured NiFi to import flows from "public" buckets in a secured NiFi-registry. 2. It does not matter whether you run your NiFi-Registry on a VM or on Docker as long as the configured ports are reachable by your NiFi. This is all a matter of your personal preference. 3. Any Version controlled Process Group in NiFi will have a NiFi background thread that checks with NiFi-Registry to see if newer version of the PG are available. If NiFi is unable to access the NiFi-registry buckets or the persisted Flows no longer exist in NiFi-registry, you can expect to see exceptions about not being able synchronize PG with NiFi-Registry. Same would happen if you deleted the configured Registry client in the NiFi configuration and created a new Registry client pointing to same NiFi-Registry. When a NiFi-Registry client is configured that client is assigned a UUID. When a process group is version controlled, what is written to the local flow.xml.gz or flow.json.gz file is that UUID along with version controlled flow ID and version. If you delete and re-create the NiFi-Registry client it will create a new unique UUID. Your flows will not update to that new UUID, so those version controlled PGs will not be able to synchronize anymore as well. Sounds like you have been making a lot of changes and it is not clear what state everything was in before you started making changes. I'd suggest starting fresh by stopping version control on all your current PGs that have been version controlled, getting your flow persistence provider working, version control your first PG, and restart both NiFi and NiFI-registry to make sure everything is still functioning as expected. Then proceed to make one change at a time you want to try and repeat the restart to see what if anything breaks. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-02-2022

@PradNiFi1236 The Remote input port will use the keystore and truststore configured in the nifi.properties file. The S2SBulletinReportingTask will use the keystore and truststore configured in the SSLContextService Controller service. It would be difficult for me to help with a potential SSL Handshake issue without the verbose output of those 4 files that are being used. <path to>/keytool -v -list -keystore <keystore or truststore filename> You need to verify that the compleete trust chain exist in the truststore used in the nifi.properties file for the ClientAuth PrivateKeyEntry from the keytsore configured in the SSLConextService. You need to verify that the complete trust chain exist in the truststore used in the SSLContextService for the ServerAuth PrivateKeyEntry found in the keystore from the nifi.properties file. You also need to make sure that your keystore does not contain more than 1 PrivateKeyEntry in it. You need to make sure that the PriavteKeyEntry has correct SAN entry(s). You should tail the nifi-user.log on the host configured in the S2SBulletinReportingTask and then enable that reporting task. If the MutualTLS handshake was successful, you should see the request being made for the S2S details. This would help you understand the exact client identity string that is being checked for authorization to /site-to-site (pretty name for policy: Retrieve site-to-site details) NiFi resource identifier policy. I also don't know the full destination URL you have configured to verify it is correct. It should just be: https://<nifihostname>:<nifiport>/ Where <nifiport> is the same port you use to access the web UI canvas. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-02-2022

@ZhouJun I'd recommend upgrading your NiFi to the latest release as you may be hitting these related bugs: https://issues.apache.org/jira/browse/NIFI-9835 https://issues.apache.org/jira/browse/NIFI-9433 https://issues.apache.org/jira/browse/NIFI-9761 Thank you, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	2,922
Kudos received	1447

Cloudera Community

Re: Invoke Http with url containing %2F

Re: Role of primary node

Re: Querying Data Provenance using FlowFile Attrib...

Re: Authentication and authorization methods in ap...

Re: Nifi PutSFTP failed to rename dot file when "O...

Re: Put error message in PutEmail message body

Re: Incremental fetch in a stateless manner

Re: NiFi 1.16.3 Processor groups are not getting d...

Re: Put error message in PutEmail message body

Re: Find and remove duplicate entries - NIFI

Re: Determine Latest CDH (7.1.7-1.cdh7.1.7{.p74.21...

Re: Nifi Testing Challenges

Re: Problem integrating Nifi Registry and github

Re: Problem in Establishing Site2Site Reporting ta...

Re: NiFi 1.15 cluster node error ConnectionLoadBal...