About MattWho

MattWho · ‎05-09-2023

@srv1009 Are you sure you are removing or replacing a flow.xml.gz. Apache NiFi 1.19 uses the flow.json.gz. The flow.json.gz is auto generated from the flow.xml.gz if and only if the flow.json.gz does not already exist at startup. As far as flow changes in your production environment goes... All the following are flow config changes: 1. Moving a component 2. Changing state of a component (start, stop, enable, disable) 3. Importing a flow from the nifi-registry or changing the version of a version controlled flow. So you are saying none of the above are happening ever in prod (seems unlikely)? You mentioned that you have "checkpointing" enabled. The only "checkpointing" that NiFi does is NOT related the flow.ml.gz or the flow.json.gz. It has to do with checkpointing the flowfile_repository which contains metadata about NiFi FlowFiles queued within connection in your datflows on the canvas. The flow.json.gz is only file being used on startup once it has been generated. Are you having disk space issues? Archive can be enabled for the flow.json.gz as part of the following properties: Archive of flow.json.gz only happens when a configuration change is made on the NiFi canvas or via some change via NiFi rest-api interaction resulting in changes to datafow(s). Archive when enabled does not randomly generate archive copies of the flow.json.gz. Only happens with each change. So if you are seeing an archive of teh flow.josn.gz generated every few minutes in the configured archive.dir? If so, then changes to your dataflow are happening. Are you sure that the SIGTERM is executing a clean shutdown of the NiFi process (./nifi.sh stop) or is it just killing the NiFi process id? If a graceful shutdown is happening you would see that in the nifi-bootstrap.log. The Graceful shutdown period is configurable by this property: If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎05-08-2023

@RajiBM The NiFi FetchSFTP processor has several "Completion Strategy" options: None (default) <- leaves file as-is on SFTP server after reading content Move File <- Moves file after successful read to directory specified in the "Move Destination Directory" property. Delete File <- Deletes original file from SFTP server after success read of content. So as long as your FetchSFTP is configured with the default "None" Completion Strategy, the File will remain in original source location on the target SFTP server. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎05-08-2023

@srv1009 If NiFi is "restarting" itself, that means NiFi did not receive a graceful shutdown request. When NiFi is launched, the bootstrap process is started, that bootstrap process then starts the NiFi core as a child process and monitors for the child process pid. If that pid disappears or the NiFi core is not responding, the bootstrap process will assume the core has died and will restart it. Typically in such scenarios, this is the result of the OS killing off the NiFi core process pid. In linux the OOM Killer will do this when system memory gets too low and it sees that child process as the largest consumer of system memory. The latest version of NiFi use a flow.json.gz to persist the dataflow. If you are still using a flow.xml.gz, what version of NiFi are you running? Regardless the NiFi flow is unpacked on startup and loaded into NiFi heap memory. When a change occurs on the canvas, the current flow.json.gz/flow.xml.gz is archived and a new one is created. To have corruption in the newly written flow.josn.gz/flow.xml.gz, that means the core NiFi process was killed while that was being written out to disk. Or like @cotopaul you have some disk corruption going on. With a graceful shutdown of the NiFi service, there is a 10 second grace period for current threads to complete before the core is killed. If you are having some disk issues, high disk latency, or disk space issues, maybe the writing of your flow.xml.gz is taking a lot longer and process is being killed before that completes, but that seems slim as multiple things would need to happen in succession (Change on canvas at almost exact time you initiate a graceful shutdown). You could increase the graceful shutdown in the nifi.properties, but i doubt that is what is impacting you here. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎05-08-2023

@zizo The ConsumeKafka processor acts as a Kafka consumer group. So makes sense that you set 30 concurrent tasks (1 per partition) assuming this is a single instance of NiFi. If you had a 3 node NiFi cluster, you would set the concurrent tasks to 10 (10 x 3 nodes = 30 consumers). What version of Kafka Server are you consuming from? MergeContent merging 100,000 FlowFiles (min). You did not share any component configurations here. Is the MergeContent configured correctly to make sure each merge generated is 100,000 FlowFiles? Generally speaking, I would recommend using two MergeContent processors in series to reduce NiFi heap memory usage. The more FlowFiles allocated to MergeContent bins, the more NiFi heap usage. So first MergeContent merging say 20,000 min, followed by second MergeContent merging 5 min would achieve the same but with lower heap usage. Update Attribute does not touch the content of a FlowFile. It simply updated metadata/attributes about a FlowFile. PutSFTP throughput is for the most part dependent on the target SFTPP server and the network between NiFi and that SFP server. Most SFTP servers only allow max 10 concurrent collections from the same client. Did you configure this processor with 10 concurrent tasks? Having a NiFi cluster would allow multiple NiFi nodes to send data concurrent to the SFTP server (10 concurrent tasks x 3). Are you saying the FlowFiles start queuing up at the putSFTP processor eventually leading to backpressure being applies all the way back through your dataflow until you reach the ConsumeKafka processor? Have you looked at CPU, disk I/O, network bandwidth and speed, NiFi heap usage? If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎05-01-2023

@plummcrazy Not sure I understand the use case here. You are looking for a way to pass an IP address and get the returned hostname and put it in a FlowFile attribute? The QueryDNS does not require any additional "special permissions" to use it. I have never tried using this processor myself. There are no other DNS specific processor designed to perform such task, you could need utilize the executeStreamCommand, ExecuteScript, or ExecuteGroovyScript processors to execute the system level command to return your reverse DNS lookup results. Execute based processor will require an additional layer of authorization since the processor is executed like every other processor as the NiFi service user (this means that ANY access the NiFi service user has will be accessible by the NiFi dataflow using these processors). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎04-27-2023

@luv4diamonds NiFi will NOT form a cluster with a single instance Zookeeper. Zookeeper must have quorum otherwise a cluster coordinator and primary node will not get elected for NiFi. For Quorum you should have an odd number of ZK nodes with a minimum of 3. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎04-25-2023

@kishan1 NiFi is a data in motion design. Processor components execute against the highest priority FlowFile in an inbound connection. An individual processor does not know how many FlowFiles to expect from any upstream component processors. So this becomes a challenge for downstream processors to know when all FlowFiles have been processed. Do you know or have a way to determine the number of FlowFile that will enter yoru process group? Perhaps details of your use case may help a community member to suggest something for you. (how does data land in your process group, how much data, how often data enters the process group, etc...) As far as your alert an external API, you could use the invokeHTTP processor to do that. The challenge here is knowing when to make that notification. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎04-25-2023

@anony You can not use NiFi Expression Language (NEL) in component property names. NEL can only be used in component property values and even then, only when the component property supports Expression language. With ExtractText, any dynamic properties added will create a new FlowFile Attribute on the outbound FlowFile with an attribute name matching the property name "${reasonId}" and the attribute set the the capture group string extracted from the source FlowFiles content. Without understanding yoru complete use case, it is difficult to offer alternative possible solutions. Why is it important to set a FlowFile property name to a reasonID? FlowFile attribute property names only mean something to NiFi. So if you were able to create dynamic property names, then how would you programmatically use them later in your dataflow as FlowFiles will have a variety of Attribute property names? If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎04-25-2023

@Vas_R There is not a lot of detail in your use case here. Where are you reading the Avro from? Did you already ingest Avro content from source into a FlowFile within NiFi? Guessing you are using ExecuteSQL as first processor in your flow to read content from a source DB? Are you looking to encrypt the entire Avro or only a portion of the Avro record content? The more detail you provide, the better the chances are that a community member will be able to help you. Thanks, Matt

MattWho · ‎04-24-2023

@databoi I see from your images that you are using Apache NiFi 1.11.4 which is around the time that the Load Balanced connection capability was introduced. There were many bugs subsequently identified in load balanced connection and addressed in future releases. I strongly encourage you to upgrade to the latest NiFi release and see if your issue persists. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

Online	Offline
Last Visited	‎11-18-2025 07:56 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-18-2025 07:56 AM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: NiFi Flow xml/json getting corrupted in multi ...

Re: Will Nifi delete a file from SFTP ?

Re: NiFi Flow xml/json getting corrupted in multi ...

Re: Read data from Kafka with NiFi

Re: NiFi Reverse DNS Lookup

Re: nifi with external zookeeper errors

Re: Notify external api/agent after Nifi is done p...

Re: How to do Dynamic property naming in NIfi?

Re: Looking to encrypt data while copying data int...

Re: NiFi Flowfiles Stucked in Round Robin Queues