About Green_

Green_ · ‎04-27-2022

Hi, I have a flow that receives JSON arrays as input. I would like to validate each of these jsons' schema, however using the ValidateRecord processor doesn't quite seem to do the job. I need to validate things such as certain fields being Enum values, having a max/min length, and ensuring required fields are present (sometimes inside of optional nested jsons). It seems an avro schema does not allow some of these functionalities and as such the Record processors can't quite validate my data as I need it. I would love to hear if anyone has had a similar use case and what they did to solve it. I am considering using the ScriptedValidateRecord processor, however I would prefer to avoid that and might instead opt for using EvaluateJsonPath to extract all the fields I want to validate and then using RouteOnAttribute with the expression language to filter out bad records. If there is a more appropriate way to validate records like this then I'm all ears. Thanks I'm advance!

Green_ · ‎10-02-2021

@yashratan Is it possible your nifi is configured to run the embedded zookeeper despite your trying to connect to your own zookeepers? Check if the nifi.state.management.embedded.zookeeper.start property in your nifi.properties file is set to true. Check if you are able to communicate with all your zookeepers from each of your nodes. This definitely seems like an issue communicating with your ZKs.

Green_ · ‎09-08-2021

EDIT: @MattWho 's answer made it clear to me I slightly misunderstood the question. His suggestion of managing the entire process using a script is definitely the way to go and would perfectly fit your use-case of someone creating a new instance of an existing process group. If I may add, it sounds like using the nifi registry might benefit you. You could upload your base process group to the registry and use version control with it. Then, when creating new copies, you would instead 'pull' the same process group from the registry instead of creating a copy of process group that exists in your canvas and might unintentionally get changed or deleted. ---------------- I do not believe there is a built-in way to directly run an event when a process group is started/stopped, but a reasonable workaround would be to monitor nifi's app-logs and trigger your own event when you receive a log that mentions starting your PG. If you only want to use tools offered by nifi, you could use the TailFile processor and configure it to run over your app-logs file, then use another processor (such as RouteOnContent) to match the log for starting the PG. From there you can try and do whatever it is that starts the administrative tasks (such as sending an http request with InvokeHTTP processor) If this answer helped, please mark it as 'solved' and/or apply 'kudos' 🙂

Green_ · ‎09-06-2021

It's a bit hard to imagine your flow just from the description, but I think I understood it. What other questions do you have about it? In my opinion it doesn't sound too great adding an attribute to every flowfile after it is written to the DB, only to then write it to a cache which control m will query (if I understood correctly). If your only requirement is to know whether all the files were successfully written to your DB, you should simply ignore files which were successfully inserted and only apply some logic when an insert fails. Perhaps if a file fails you can write it to someplace else so you will be able to investigate why it failed (some place more persistent than a cache). If you just want to be alerted when an insert fails / want to return a response to control m, just add an invokehttp processor after the failure relationship from your putDB processor (if I correctly understood that control m expects http calls). Because nifi is stream oriented, it's hard to tell exactly when a batch of files has finished writing to your DB unless you know exactly how many records should be written (and then counting the flowfiles passed to success is actually reasonable).

Green_ · ‎09-04-2021

In general, Nifi is not very well suited for event-based processing (E.G. an external scheduling tool pinging nifi to start a process group run). I do not know how Control M works, but what you're describing sounds like it could be achieved with Nifi's REST API (you can directly start/stop a specific process group by its ID). The requirement for checking if everything got inserted to your database is also quite hard to accomplish accurately. You could use the REST API once more to check your process group has no queued files (which would mean all your flowfiles successfully passed through the flow), though you'll also have to think about what should happen if writing to the DB fails. I don't believe there is any great way to check if your scheduled run 'completed', but you could definitely use some other processor to 'notify' yourself if something failed. If this answer helped, please mark it as 'solved' and/or apply 'kudos' 🙂.

Green_ · ‎09-04-2021

A bit late to the party, but do you (or anyone else who might have encountered this problem) have any extra info to share about it? I am currently experiencing a similar issue.

Green_ · ‎08-10-2021

@hegdemahendra I have found this article by Pierre V. Where he goes into deeper detail about the logback.xml file. He mentions something that might be relevant to what you're looking for - the following two passages are what caught my eye: ``` "We can also define new appenders in the log configuration file and change it according to our needs. In particular, we could be interested by the SMTP Appender that can send logs via emails based on quite a large set of conditions. Full documentation here." "Obviously you can also configure this configuration file so that NiFi log files integrate with your existing systems. An idea could be to configure a Syslog appender to also redirect the logs to an external system." ``` I myself have never done something like this, but it sounds like a step in the right direction for directly writing logs from nifi to mongo.

Green_ · ‎08-04-2021

@hegdemahendra Filtering like that in nifi could work though it might be a bit resource intensive depending on the logs. Might be worth checking out a different tool specifically designed for handling logs. I've written logs to elasticsearch using logstash in the past, perhaps it could also work for writing to a mongodb.

Green_ · ‎08-04-2021

Your approach sounds perfectly reasonable if you only plan to use native nifi tools. I didn't fully understand what kind of filtering you meant, but simply reading the logs with the TailFile processor and sending them to mongo with the PutMongo processor sounds like it would work for your use-case.

Green_ · ‎08-03-2021

@Josiah_Johnston Based off your last comment, my new hunch would be that perhaps there is something going on with the volume you use for the content repository. Still, it's hard to say without more testing. Here are a couple of tests/checks I would run if this happened in one of our nifi clusters (both the problem as you describe it and what I could spot from the screenshot you sent): While the content repo is empty, are there any other flowfiles being processed in the node? it would make no sense for any flow/ingestion to work if the content repository is completely empty. Perhaps the content claims are being written elsewhere, or perhaps they are immediately deleted upon being created in the content repo What happens to the flowfiles which were already in the flow once the problem started? if all their content was deleted, they shouldn't be able to proceed in the flow even after a restart. What happens if you try and view their content in the UI? Perhaps there are some helpful logs written to the nifi applogs once the problem starts (just before the errors relating to not finding content claims start flooding in) What would happen if you were to create a file in the content repo and then reproduced the problem? would it only delete the nifi-generated content repo files/directories or would it also delete your own file? Is there a way to tell on which node the issue will happen on? Is it perhaps happening on the same node (for the same storage volume) repeatedly? What is the flowfile repository's status while the issue is happening? Does it still have all its regular files even when the content repo was deleted? If you try and google something along the lines of 'nifi content repository empty / deleting' no relevant results come up. My team and I have never experienced something similar to this either. This is why I suspect it is perhaps not a nifi related issue but rather something to do with your infrastructure / something else on your end.

Online	Offline
Last Visited	‎01-15-2026 05:25 AM

Member Since	‎08-01-2021 07:36 AM
Last Visited	‎01-15-2026 05:25 AM
Posts	57
Kudos received	14

Cloudera Community

Re: How to delete a variable from nifi variable re...

Re: PutDatabaseRecord truncates microseconds from ...

Re: nifi-replace file headers using replacetext pr...

Re: Get only element based on a single parameter

Re: Update attributes of json content using rules

Validating JSON schema with max length and enum

Re: Nifi on 3-node Docker cluster

Re: Execute a process when a ProcessGroup starts (...

Re: Nifi process group scheduling via control m

Re: Nifi process group scheduling via control m

Re: Why does NiFi performance slow down?

Re: What is best way to read nifi logs and put in...

Re: What is best way to read nifi logs and put in...

Re: What is best way to read nifi logs and put in...

Re: Processor fails with FlowFileAccessException: ...