About MattWho

MattWho · ‎08-10-2022

@Ray82 I am assuming you are using the ExecuteSQL processor to execute the SQL Select statement example you shared. The response would be written to the content of the FlowFile pass to the success relationship. You could use the ExtractText processor to extract content from the FlowFile and assign it to a new FlowFile attribute you name "model". If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-10-2022

@VJ_0082 So you custom PythonScript executes and when it errors, a log is written on some host other than the host where NiFi is running? The GetFile processor can only get files from the local filesystem. I can get files from a remote filesystem (unless that remote file system is locally mounted). What is the error and stack trace your are seeing in the nifi-app.log when your PutEmail processor executes against your source FlowFile? Do you have a sample of FlowFile content being passed to the PutEmail processor? How has your PutEmail processor been configured? Matt

MattWho · ‎08-10-2022

@Nifi- You'll need to provide more detail around your use case in order to get more specific assistance. NiFi offers a number of processor components that can be used to ingest from a database: - ExecuteSQL - ExecuteSQLRecord - CaptureChangeMySQL <-- probably what you are looking for These ExecuteSQL processors will utilize a DBCPConnectionPool controller service for connecting to your specific Database of choice. SQL is what is needs to passed to these processors in order to fetch database table entries. The following processors are often used to generate the SQL in different ways needed by your use case to do this in an incremental fashion (for example: generating new SQL for new entries only so you are not fetching entire table over and over) - GenerateTableFetch - QueryDatabaseTable - ListDatabaseTable The CaptureChangeMySQL processor will output FlowFiles for each individual event. You can then construct a dataflow to write these events to your choice of location. That might be some other database. Once you have your dataflow created for ingesting entries from your table in to NiFi, you'll need to use other processors within your dataflow for any routing or manipulation of that ingested data you may want to do before sending to a processor to write to the desired destination. Possibly using PutDatabaseRecord processor for example? If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-08-2022

@nk20 I am confused by your concern about in memory state. Can you provide more detail around what you are being told or what you have read that has lead to this concern? Perhaps those concerns are about something more than component state? Perhaps I can address those specific concerns. Not all NiFi components retain state. Those that do either persist that state to disk in a local state directory or write that state to zookeeper. As long as that local disk where state directory is persisted is not lost and the Zookeeper has quorum (min three nodes), then you have your state protected for your NiFi components that write state. Out of all the components (processors, controller services, reporting tasks, etc), there are only about 25 that record state. The only thing that lives in memory only is component status (in, out, read, write, send, received). These are 5 minute stats that live in memory and thus any restart of the NiFi service would set these stats back to 0. These have nothing to do with the FlowFiles or execution of the processor. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-08-2022

@VJ_0082 When a NiFi processor errors it will produce a bulletin. NiFi has a SiteToSiteBulletin reporting task. This reporting task can send these produced bulletins to a remote input port on either the same NiFi or a completely different NiFi. You can construct a dataflow from that remote input port to route these FlowFiles to your putEmail processor. Constructing a dataflow like this avoids needing to deal with non error related log output or ingesting same log lines over and over by re-reading the app.log. By default processors have their bulletin level set to ERROR, but you can change that on each processor if so desired. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-05-2022

@nk20 If you are running a standalone NiFi, state is stored via the configured local state provider. If the node crashes you don't lose that state. NiFi will load that local state when it is restarted. Only way you would lose state is if server was unrecoverable (but you have also lost your currently queued data, your entire flow, etc... You can and certainly should have your NiFi's repos, state directory, and conf directory located on RAID disks to protect against loss in event of disk failure. A better option is to setup a NiFi cluster. Processors like GenerateTableFetch will then use cluster state which is stored in Zookeeper (ZK) (recommend setting up an external 3 node ZK Cluster rather then using NiFi's embedded ZK). There are many advantages to using a NiFi cluster rather than a standalone single NiFi instance beyond just having state stored in ZK. 1. Distributed processing across multiple server 2. Externally stored cluster state 3. Avoid complete flow outage in event of a node failure. 4. All nodes execute exact same flow and thus each have a copy of it. In a NiFi cluster you would start your dataflow with your GenerateTableFetch processor configured to execute on "Primary node" only. Within a NiFi cluster one node will be elected to be the "primary node". The success relationship connection would then be configured to load balance the generated FlowFiles containing your SQL statements. This would allow all nodes in your cluster to concurrently execute those SQL statements in your downstream processors which are configured to execute on al nodes. If the currently elected primary node should crash, a new primary node will be elected. When that happens the processor configured for "primary node" only execution will retrieve that last state written to ZK and pickup processing where old node left off. Off the top of my head nothing comes to mind in terms of being able to solve your use case in a stateless manor. However, maybe others in the community have some thoughts here. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-05-2022

@chitrarthasur Can you share the complete error and any stack trace following it in the nifi-app.log and/or nifi-user.log when you try to delete the process group? Thanks, Matt

MattWho · ‎08-05-2022

@VJ_0082 Your use case is not very clear. What exactly are you trying to accomplish via your existing dataflow and what issues are you having? Thanks, Matt

MattWho · ‎08-04-2022

@code Have you considered using GenerateTableFetch, QueryDatabaseTable, or QueryDatabaseTableRecord that generates SQL that you then feed to the ExecuteSQL to avoid getting old and new entries with each execution of your existing flow? Avoiding ingesting duplicate entries is better then trying to find duplicate entries across multiple FlowFiles. You can detect duplicates within a single FlowFile using DeduplicateRecord; however, this requires all records are merged in to a single FlowFile. You can use DetectDuplicate; however, this requires that each FlowFile contains one entry to compare. Using these methods add a lot of additional processing in your dataflows or holding of records longer then you want in your flow and this probably not the best/most ideal solution. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎08-04-2022

@mhsyed The latest Cloudera Runtime version can be found here (latest at top of list): https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/release-guide/topics/cdpdc-runtime-download-information.html So latest version is CDH-7.1.7-1.cdh7.1.7.p1000.24102687 (CDP 7.1.7 Service Pack 1). If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

Online	Offline
Last Visited	‎11-18-2025 07:56 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-18-2025 07:56 AM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Apache Nifi - Add attribute to Flow File based...

Re: Put error message in PutEmail message body

Re: CDC Postgresql Nifi

Re: Incremental fetch in a stateless manner

Re: Put error message in PutEmail message body

Re: Incremental fetch in a stateless manner

Re: NiFi 1.16.3 Processor groups are not getting d...

Re: Put error message in PutEmail message body

Re: Find and remove duplicate entries - NIFI

Re: Determine Latest CDH (7.1.7-1.cdh7.1.7{.p74.21...