Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Log messages are missing with NIFI HA testing

avatar
Explorer

Hi,

I have two NIFI instance running and i am sending data. Both the nodes receives the data in random order.

 

when i kill one of NIFI instances(first instance), all the data is processed by other NIFI instance(Second instance). And again when i bring back the other NIFI instance which was killed , the data is distributed among them in random order.

 

Here there is no data loss observed, but we are observing some log messages are missing in the nifi-app.log as part of this restart.

For example, during this testing if i have sent 100 messages, i see only log messages for 97 messages.

 

Is this expected in NIFI or is there a way we can fix this issue? Is it related to gracefully shutting down the NIFI?

 

Any help would be appreciated.

 

Thanks

6 REPLIES 6

avatar
Super Mentor

@srilakshmi 
NiFi only offers HA at the controller level and not at the data/flowfile level.  HA at the controller level is possible due to NiFi's zero master clustering capability that relies on a Zookeeper (ZK) quorum to elect am available NiFi node as the cluster coordinator. If the current elected cluster coordinator goes down, ZK elects another active node to assume this role.  The zero master clustering allows you to access your NiFi cluster from any one of the active cluster nodes.

 

Each node in the NiFi cluster has its own identical copy of the flow and its own set of repositories.  NiFi nodes can not share repositories.  So any queued FlowFile on a node that goes down will remain on that node until it is brought back online.  This is what you are observing based on your description provided.

When you execute the command to shutdown NiFi, it does initiate a graceful shutdown.  The amount time for this graceful shutdown is controlled by this configuration property in the nifi.properties file:
nifi.flowcontroller.graceful.shutdown.period

the default is 10 seconds.  If any active thread does not complete within that graceful shutdown period, the thread is killed with the JVM. This will not result in dataloss since a FlowFile is not removed from the inbound connection of a processor unless the thread completed and FlowFile was successfully transferred to an outbound connection.  On startup FlowFile, the NiFi flow is loaded, FlowFile are loaded back in to connections, and then components are enabled and started.  

I'd be interested in your test dataflow and what logging your are looking for and from which processor component?  Have you checked NiFi's data provenance to search for the lineage of your 3 FlowFiles you were missing logging for? 


If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Explorer

Hi @MattWho 

Thanks for the reply.

 

The log message which are missing are the custom logs which we added using "logAttribute" processor in NIFI. we are using NIFI for kafka, when kafka publish is success or failure we log some message and this done using the logAttribute processor.

 

These log messages are important to us to identify if all the messages are successfully published to kafka or not.

 

we are observing that few of the custom logs added using logAttribute processor are missing.

Please let me know the possible reason for these log messages missing.

 

And please note that this issue is seen only during HA testing, like bringing down one NIFI instance and bringing it up back.

 

Thanks.

avatar
Super Mentor

@srilakshmi 
Were you able to identify the three FlowFile that did not produce the expected log output from your logAttribute processors?

Do you auto-terminate the "success" relationship out of your LogAttribute processors?
If so, do you see DROP events in those FlowFIle's provenance lineage?
If you look at the event timestamp does is correlate to a time that falls between when the shutdown was initiated and when you restarted the NiFi instance?

What version of NiFi are you using?  Is it older than Apache NiFi 1.16?
If so, you may be hitting this bug addressed as of the NiFi 1.16 release:
https://issues.apache.org/jira/browse/NIFI-9688

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt



avatar
Explorer

Hi @MattWho 

I am not able to found those 3 flow files yet.

 

Yes i am auto terminating the success relationship out of logAttribute processor, i will check regarding DROP events and let you know.

 

And also let me check on the NIFI version we are using.

 

Thanks for the reply

avatar
Explorer

Hi @MattWho 

I am not able to find those 3 flow files in provenance lineage, because the test was performed on March 20th and the oldest data i could find is from March 22nd in data provenance lineage.

 

As per NIFI UI, the version is  "Apache NIFI Version 1.9.0" , seems like we need to upgrade to 1.16 as the version is older(1.9.0)

avatar
Super Mentor

@srilakshmi 
Yes, Apache NiFi 1.9.0 was released over 4 years ago on February 19, 2019.  Many bugs, improvements and security fixes have made there may into the product since then.  The latest release as of this post is 1.20.

While i can't verify 100% from what exists in this thread that you are experiencing NIFI-9688, the odds are pretty strong.

You can fin the release notes for Apache NiFi here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.20.0

 

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt