Created 03-17-2019 03:23 PM
Hello NiFi community,
I have been investigating a problem where the most recent data provenance segment in a shut down nifi application session is not available when the nifi application is next started.
My investigation has been to repeatedly (20+ times) stop nifi, start nifi (and let it generate a bit of data provenance), and then inspect the data provenance when the UI becomes available to confirm that all previous data provenance segments are accessible. Then repeat, and repeat, and repeat.
At first, I was able to reproduce the problem fairly reliably ever second or third time that I stopped nifi and started it again. The data provenance generated during the previous nifi application session was not displayed in the current nifi application session. I could still see the .prov files in the provenance_repository on disk, so I concluded the most recent .prov file from the previous nifi application session had been corrupted.
I then discovered this recommendation (https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_user-guide/content/bootstrap-conf.html) that suggests that you should comment out UseG1GC in bootstrap.conf when using the write-ahead configuration for provenance.
I followed the instructions and commented-out the line in bootstrap.conf. Then, I observed that the data provenance generated during previous sessions of nifi were no longer missing from subsequent nifi sessions. The change appeared to stem whatever problem was occurring. Assuming that it was a corruption problem, disabling UseG1GC seemed to prevent the corruption that I had been observing.
This behaviour (to me) seems to confirm that the stability of WriteAheadProvenanceRepository becomes questionable when UseG1GC is configured.
I have a few questions related to what I have seen:
Thank you, kindly, for your time and attention.
Sean
Created 03-18-2019 07:02 PM
1. The problem with G1GC in Java 8 is not unique to provenance. G1GC in Java 8 and earlier was still considered experimental. It was observed that G1GC had better performance with the larger heap sizes commonly used in NiFi setups, so early on it was the recommended GC for NiFI. While these bugs exist that can cause corruption in the in the Java heap space, we had not encountered the corruption prior to the introduction of the new WriteAheadProvenance implementation. While the G1GC issues have been resolved as of Java 9, those fixes were not back ported to earlier versions of Java. NiFi currently only supports Java 8, so we decided to move away from recommending using G1GC when using the new high performant WriteAheadProvenance implementation.
https://wiki.apache.org/lucene-java/JavaBugs
2. The change of default provenance implementation from PersistentProvenance to WriteAheadProvenance only occurred recently and it appears no one changed the configuration in the bootstrap.conf at that time to comment out the G1GC line.
It is also very likely we will again recommend G1GC once NiFi supports newer versions of Java where these G1GC issues have been addressed.
3. I have no answer for number 3, perhaps someone else can comment on that question in your query.
Thank you,
Matt
If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
Created 03-18-2019 07:02 PM
1. The problem with G1GC in Java 8 is not unique to provenance. G1GC in Java 8 and earlier was still considered experimental. It was observed that G1GC had better performance with the larger heap sizes commonly used in NiFi setups, so early on it was the recommended GC for NiFI. While these bugs exist that can cause corruption in the in the Java heap space, we had not encountered the corruption prior to the introduction of the new WriteAheadProvenance implementation. While the G1GC issues have been resolved as of Java 9, those fixes were not back ported to earlier versions of Java. NiFi currently only supports Java 8, so we decided to move away from recommending using G1GC when using the new high performant WriteAheadProvenance implementation.
https://wiki.apache.org/lucene-java/JavaBugs
2. The change of default provenance implementation from PersistentProvenance to WriteAheadProvenance only occurred recently and it appears no one changed the configuration in the bootstrap.conf at that time to comment out the G1GC line.
It is also very likely we will again recommend G1GC once NiFi supports newer versions of Java where these G1GC issues have been addressed.
3. I have no answer for number 3, perhaps someone else can comment on that question in your query.
Thank you,
Matt
If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
Created 03-19-2019 07:36 PM
@Sean Dockery
A Jira has been filled to comment out the G1GC line in the NiFi bootstrap.conf in next Apache release:
https://issues.apache.org/jira/browse/NIFI-6132
Created 03-20-2019 12:46 AM
Thanks, Matt. Hopefully this change will avoid headaches for users who don't visit every single configuration option when implementing NiFi.