Support Questions

Find answers, ask questions, and share your expertise

NiFi Flow xml/json getting corrupted in multi node setup

avatar
Explorer

Hi All,

 

I have Nifi running on EKS cluser with 3 nodes. Every now and then, flow.xml.gz gets corrupted in one of the node and it fails to start back up again. Same things happens with other nodes sooner or later and ultimately NiFi crashes.

 

The error from nodes startup logs - 

schema validation Error parsing Flow configuration at line <line_no>, col <col no > : cvc-complex-type:2.4.d : Invalid content was found starting with element 'property'. No child element is expected at this point.

It has become a pain to maintain Nifi because of this issue. Please suggest a solution.

11 REPLIES 11

avatar
Explorer

Sorry for the late response. I wanted to come back with more information.
Yeah, I am sure that I replace flow.xml.gz to recover from this issue. As I mentioned, I have configured default flow file to flow.xml.gz using nifi properties, so it uses xml and as you said newer versions support json, I tried using json too. 
Also I can confirm that shutdown is graceful. After much debugging and root cause analysis, I am able to reproduce the issue now and have the exact cause of error. Logs don't concur with my findings, but I seek your and community's help in confirming the issue and finding a resolution.

My Findings - 

1) Nifi was failing for me in 1st restart itself.

2) From logs, its clear that It tries to Load flow , then runs probably DTD validation on it, throws warning and then shuts down.

3) These warnings comes with newly created basic flow(with couple of processors) as well, and new flow was able to load post warning, It moves ahead from the point where it throws warning.

4) If I put point 2 and 3 together, It was safe to logically assume that there is something in Flow itself causing nifi to fail on restart.

5) Everything in Flow was plain nifi processors doing bunch of tasks, However I had a ExecutScript block with python code to do some modifications on flowfile content. so, I created a fresh flow with just 3 processors - GenerateFlowFile, ExecuteScript, LogAttribute
In GenerateFlowfile I configured some random content. In ExecuteScript I copied code from https://community.cloudera.com/t5/Support-Questions/How-to-Iterate-the-Flow-file-in-Nifi-using-Pytho... after indenting it properly and then just terminated the flow on LogAtrribute. 
Basically a very simple flow to do some modificaion on flowfile content. That's it I kept them running , it was working fine. So, I went ahead and deleted the pod, logs indicated graceful shutdown. Pod restarted and got the exact same error, with warnings on schema and error server is shutting down.
I took fresh nifi instances and did it several times and it fails everytime with ExecuteScript block running python code.
Logs are not very informative but I can confirm that its the root cause of the issue. Can you or anyone else confirm if its replicable, if yes, what should be the resolution?

 

Nifi version used - 1.19.1
 

avatar
Explorer

@cotopaul Please go through my latest response before concluding its a hardware or setup issue.
FYI, Nifi is working fine for more than a week for me since I removed the ExecuteScript Processor.

That's the only change I did and I replicated the issue several times before posting here.
Nifi has restarted several times since then without any issues. 
could you please care to explain what sort of hardware issue it could be that it affects only ExecuteScript processor running python code?