About MattWho

MattWho · ‎05-29-2024

@Dilipkumar I am not sure what you mean by backups. Backups of what? The NiFi-Registry is used to version control Process Groups from one or more NiFi instances. Those version controlled flow definitions include all configurations (minus any sensitive properties values). A version controlled flow definition can be imported to any NiFi instance or cluster that has authorized access to the NiFi-Registry bucket in which the it is stored. NiFi-Registry can be configured to persist the flow definition storage in a local file persistence or in a git repository. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-28-2024

@Alexy 100% agree with @ckumar Why is your NiFi producing som much logging? Additional loggers? Increased log levels? Huge FlowFile volume? Why are you not compressing (gz)on rollover to save disk space? Keep in mind that compression will take longer the larger the log file. The performance is not going to change whether you are writing/appending to 100 MB or larger log files. But you do have disk I/O related to amount of logging you are producing. Matt

MattWho · ‎05-28-2024

@scoutjohn The Site-To-Site (S2S) configuration properties configure how your NiFi instance handles both inbound S2S to and outbound S2S connections are handled. It is the receiving instance of NiFi the determines if S2S communication should be secure or not. nifi.remote.input.secure=true nifi.remote.input.socket.port=10000 nifi.remote.input.http.enabled=false First you need to understand how S2S works. The instance of of NiFi with a RemoteProcessGroup (RPG) or a S2S Reporting task is the client side of the connection. When that client component (RPG or S2S reporting task) executes it need to communicate with the target NiFi. That initial communication is always going to be over HTTP(S) to the target NiFi. So if the target NiFi is secured (nifi.web.https.port configured) and the URL provided to RPG or S2S reporting task is "HTTPS" the initial connection is going to be secure. This initial connection is used to fetch S2S details from the target NiFi. Included in those S2S details are numerous bits of information to include: Does target support FlowFile http(s) input transfer? (nifi.remote.input.http.enabled) Does target NiFi support socket based FlowFile transfer? (nifi.remote.input.socket.port) Does target enforce secure communictaions (nifi.remote.input.secure) List of remote inbound and remote output ports the client is authorized to see. How many nodes in the target NiFi cluster. Load on each of those nodes etc. With the setup you shared your NiFi is setup with only the nifi.web.https.port configured meaning that this NiFi can only support https communications from S2S connections. Not sure why you would want to send your data unsecured over your network. Whey not send secure since your NiFi is already secured over https. Now if you were to also configure the nifi.web.http.port (which makes no sense since you would be exposing your NiFi UI unsecured over http as well as secured over https), does it still force nifi.remote.input.secure back to true from false? I have not confgures http and https at same time for a very very long time (only some done rarely when there were different internal and external networks). I could not find any Apache Jiras that stated this is no longer an option, but it is possible that this has changed. But even if possible, i still question using unsecured when your NiFi is already secured. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-28-2024

@mohammed_najb Is the ExecuteSQL the first processor in your dataflow or is it being fed by an inbound connection from some upstream processor such as the GenerateTableFetch processor? I only ask since ExecuteSQL processor does not retain and state so it alone would not be the best choice for ingesting from an active table that may be having additional rows added to the DB regularly. As far as the ExecuteSQL, it writes out attributes on the FlowFiles it produces. The "executesql.row.count" will record the number of rows returned by the query OR the number of rows in the specific produced NiFi FlowFile's content when "Max rows per FlowFile" property is configured with a non zero value. When multiple FlowFiles are being produced, you could use an UpdateCounter processor to create a counter and use the NiFi Expression Language "${executesql.row.count}" as the delta. As far as your query about "process fails " is concerned. The ExecuteSQL will execute the SQL query and based on configuration create 1 or more FlowFiles. Also based on configuration it will incrementally release FlowFiles to the downstream connection or release them all at once (default) via OutputBatch Size configuration. Assuming using default, no FlowFiles are output until until query is complete and all FlowFiles are ready fro transfer to the outbound connection. If failure happens prior to the is transfer (system crash, etc.), no FlowFiles are output. On next execution of the ExecuteSQL the query is executed again if no inbound connection. If ExecuteSQL is utilizing and inbound FlowFile from an inbound connection to trigger the execution, processing failure would result in FlowFile routing to failure relationship which you could setup to retry. If system crash, FlowFile remains in inbound connection an simply starts over execution on system restore. Hopefully tis gives you some insight to experiment with. As is the case with many use cases, NiFi often has more then 1 way to build them and multiple processor options. The more detailed you are with yoru use case, the better feedback you may get in the community. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@Jagapriyan The exception is telling you that the following properties were not configured in the bootstrap.conf file for your MiNiFi: nifi.minifi.sensitive.props.key= nifi.minifi.sensitive.props.algorithm= Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@alan18080 The Postgres DB only holds metadata and does not contain the actual datflow in NiFi-registry. The version controlled dataflows are stored in the configured FlowPersistenceProvider. So if you are not preserving the actual flow contents, then the metadata loaded from your postgresSQL will not find it after redeployment. I would also recommend upgrading your NiFi-Registry and NiFi versions. NiFi-Registry is going on 4 years old. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@donaldo71 You pasted the exact same question as @Sofia71 here: https://community.cloudera.com/t5/Support-Questions/Problem-with-merge-BIN-MANAGER-FULL/td-p/388211. to which i have already responded with below: Sharing the specific processors used in your dataflow and the configuration of some of them to include your MergeContent processor configuration may help clarify some your specific setup. You are using a ConsumeKafka processor to consume messages (multi-record content) from a Kafka topic. I am assuming that each of these consumed messages only contains two single records so the the SplitRecord processor only produces 2 FlowFiles for every FlowFile it splits? Immediately after SplitRecord processors you have configured the "Round Robin" load -balancing strategy on the outbound connection. This probably going to be your first issue here. Each node in a NiFi cluster runs its own identical copy of the flow against only the FlowFiles present on that one specific node. Each node has not access to or ability read FlowFiles present on other nodes. So if 1 flowFile produced by splitting of a record is on node1 and the other FlowFile is on Node2, the downstream MergeContent is not going to be able to merge them back together. So first question is whether you even need to setup load-balancing on the connection since you are consuming your messages from a Kafka topic. 1. How many nodes in your NiFi Cluster? 2. How many partitions on the Kafka topic from which you are consuming? The consumeKafka processor uses a "Group ID" to identify a consumer group. so Every node in your NiFi Cluster that is running this consumeKafka processor is a member of the same consumer group. So lets assume your source Kafka Topic has 3 partitions and your NiFi cluster has 3 nodes. What would happen here is each node's consumeKafka is assigned to one of those partitions. This means that each node is consuming a uniques set of messages from the topic. So no need to then load balance. Assuming above is not what you are doing, then the proper load-balancing strategy to use would be "Partition by attribute" which use an attribute on the FlowFile to make sure that FlowFiles with the same attribute value get sent to same node. Now on to the MergeContent processor. MergeContent upon execution reads from the inbound connection queue and starts assigning FlowFiles to bins. It does not search the inbound connection for matches. It simply reads in order listed and works its way down the list. First FlowFiles is allocated to a bin, then next FlowFile is if can't be allocated to same bin is placed in second bin, and so on. If a flowFile is allocated to every bin and then the next FlowFiles does not belong to any of those bins, MergeContent force merges the oldest bin to free up a bin for the new FlowFile allocation. There is no way to change how this works as the processor is not designed to parse through all connection FlowFiles looking for matches before allocating to bins. That would not exhibit very good performance characteristics. What is your concern with increasing number of bins? This might be a use case for wait/notify processors. So after you split the record in to two FlowFiles, one FlowFiles is currently routed to an invokeHTTP for further attribute enrichment and the other FlowFile ie routed directly to MergeContent? If so, so this means that the FlowFiles that don't get additional processing will queue up much sooner at MergeContent. But if you add a Notify processor after invokeHTTP processor and the Wait processor in the other FlowFile path before MergeContent you could control the release of FlowFiles to the mergeContent processor. This is just one suggestion you could try, but i would start first by making sure you are handling the distribution of your split FlowFiles correctly. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@Sofia71 Sharing the specific processors used in your dataflow and the configuration of some of them to include your MergeContent processor configuration may help clarify some your specific setup. You are using a ConsumeKafka processor to consume messages (multi-record content) from a Kafka topic. I am assuming that each of these consumed messages only contains two single records so the the SplitRecord processor only produces 2 FlowFiles for every FlowFile it splits? Immediately after SplitRecord processors you have configured the "Round Robin" load -balancing strategy on the outbound connection. This probably going to be your first issue here. Each node in a NiFi cluster runs its own identical copy of the flow against only the FlowFiles present on that one specific node. Each node has not access to or ability read FlowFiles present on other nodes. So if 1 flowFile produced by splitting of a record is on node1 and the other FlowFile is on Node2, the downstream MergeContent is not going to be able to merge them back together. So first question is whether you even need to setup load-balancing on the connection since you are consuming your messages from a Kafka topic. 1. How many nodes in your NiFi Cluster? 2. How many partitions on the Kafka topic from which you are consuming? The consumeKafka processor uses a "Group ID" to identify a consumer group. so Every node in your NiFi Cluster that is running this consumeKafka processor is a member of the same consumer group. So lets assume your source Kafka Topic has 3 partitions and your NiFi cluster has 3 nodes. What would happen here is each node's consumeKafka is assigned to one of those partitions. This means that each node is consuming a uniques set of messages from the topic. So no need to then load balance. Assuming above is not what you are doing, then the proper load-balancing strategy to use would be "Partition by attribute" which use an attribute on the FlowFile to make sure that FlowFiles with the same attribute value get sent to same node. Now on to the MergeContent processor. MergeContent upon execution reads from the inbound connection queue and starts assigning FlowFiles to bins. It does not search the inbound connection for matches. It simply reads in order listed and works its way down the list. First FlowFiles is allocated to a bin, then next FlowFile is if can't be allocated to same bin is placed in second bin, and so on. If a flowFile is allocated to every bin and then the next FlowFiles does not belong to any of those bins, MergeContent force merges the oldest bin to free up a bin for the new FlowFile allocation. There is no way to change how this works as the processor is not designed to parse through all connection FlowFiles looking for matches before allocating to bins. That would not exhibit very good performance characteristics. What is your concern with increasing number of bins? This might be a use case for wait/notify processors. So after you split the record in to two FlowFiles, one FlowFiles is currently routed to an invokeHTTP for further attribute enrichment and the other FlowFile ie routed directly to MergeContent? If so, so this means that the FlowFiles that don't get additional processing will queue up much sooner at MergeContent. But if you add a Notify processor after invokeHTTP processor and the Wait processor in the other FlowFile path before MergeContent you could control the release of FlowFiles to the mergeContent processor. This is just one suggestion you could try, but i would start first by making sure you are handling the distribution of your split FlowFiles correctly. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@alan18080 Please provide more details around the steps you are performing and the exact versions of Apache NiFi and Apache NiFi-Registry you are using. So you have an existing NiFi-Registry configured to use Postgres as its metadata database. You have already version controlled some NiFi process group(s) to buckets within this registry. And now you are trying to export those version controlled process groups from this NiFi-Registry and import into another NiFi-Registry? What version are the two NiFi-Registries? What steps did you perform that eventually resulted in the exception encountered? I am not clear on what you are doing when you say "transfer". Sharing the complete error and stack trace can also be helpful. Thank you, Matt

MattWho · ‎05-21-2024

@Racketmojster NiFi passes FlowFiles from component processor to component processor. A directory is nothing more than metadata and not actual content in itself. It is a path leading to hopefully data. So there is no NiFi processor specifically for generating a FlowFile based off a discovered parent directory path. But like I said in original my original post, a dataflow consisting of: ListFile --> UpdateAttribute (to extract date from absolute.path FlowFile attribute to new attribute for example "date.path") --> DetectDuplicate (configured with "${date.path}" as "Cache Entry Identifier" value) --> ReplaceText (optional: only needed if content of FlowFile needs to have directory path in it to pass to your script. Ideally this is not needed if you can just pass the FlowFile ${data.path} attribute value to your script. No idea what input is expected for your script) --> ExecuteStreamCommand So while the flow may list all files from the same date path, we will de duplicate so only one FlowFile from each unique date path is passed on to your executeStreamCommand processor so it can then execute your script agains that source directory. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎01-14-2026 12:41 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-14-2026 12:41 PM
Posts	3,421
Kudos received	1620

Cloudera Community

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: Is there a way to importbackups in nifiregistr...

Re: Nifi Logrotation Policy

Re: Fetch Provenance data using SiteToSiteProvenan...

Re: How to determine if a ExecuteSQL has ingested ...

Re: MiniFi - nifi.sensitive.props.key is not set

Re: Error importing flows from a registry instance...

Re: Problem with merge content in Nifi

Re: Problem with merge BIN_MANAGER_FULL

Re: Error importing flows from a registry instance...

Re: Apache Nifi passing a folder as an flowfile to...