About Green_

Green_ · ‎10-06-2022

I've run into similar issues and haven't reached a clear conclusion either. It seems your have very high heap usage which might potentially be relevant.

Green_ · ‎10-06-2022

Hello, I would like to monitor many clusters' status data. I have gone with the method of adding a SiteToSiteStatusReportingTask to each of my clusters and sending all the statuses to a single remote input port in a dedicated cluster. The issue I've encountered is that I couldn't find a reasonable way to aggregate statuses from different nodes to get a clearer picture of a component's status. For example, in a 4 node cluster I've got a process group with 250 files queued on each node, 1000 in total. The reporting task on this cluster would send 4 separate flowfiles to my dedicated cluster, each describing the 250 metric. All of these flowfiles have attributes describing their origin reporting task, however (perhaps for valid reasons) there is no unique identifier describing that all 4 of these flowfiles originated from a specific run of the reporting task. As such, I cannot aggregate these 4 files with a sum and get a more accurate number similar to what I can see in the UI when looking at the aforementioned process group. Since my clusters have varying numbers of nodes I cannot attempt to merge 4 records specifically and I do not want to create duplicate flows for differing node counts. In comparison, when using the REST API to GET a process group, there is a field named "aggregate Snapshot" which accurately calculates the sum of all nodes - this is what I am interested in. Overall I would like to use the reporting tasks so I remain in a data-push method rather than data-pull by sampling all my clusters' rest API. If anyone has any recommendations for how to do this they would be very much appreciated. Thanks in advance, Eyal

Green_ · ‎07-21-2022

Hello, I have a MergeRecord processor that is not merging despite my conditions (with the exception of bin age). I have configured: - 1 minimum record - 2000 maximum records - 1 MB minimum size - no size limit - 10 bins - 30 min max age - bin packing strategy, no correlation attribute I have a 3 node cluster, and the queue before the MergeRecord processor has got millions of files before it that reach a couple Gigabytes in size. I can see the processor keeps opening/closing tasks (by the thread count icon on it), but no files are getting merged and outputted, except for the bin age being reached. I believe all the minimum merge requirements are met, and even the max record limit should be reached, yet the processor isn't working as I understand. I would appreciate any help debugging why it is not merging as expecteded. If it is relevant, I use a json reader and parquet writer. Thanks, Eyal.

Green_ · ‎04-27-2022

Hi, I have a flow that receives JSON arrays as input. I would like to validate each of these jsons' schema, however using the ValidateRecord processor doesn't quite seem to do the job. I need to validate things such as certain fields being Enum values, having a max/min length, and ensuring required fields are present (sometimes inside of optional nested jsons). It seems an avro schema does not allow some of these functionalities and as such the Record processors can't quite validate my data as I need it. I would love to hear if anyone has had a similar use case and what they did to solve it. I am considering using the ScriptedValidateRecord processor, however I would prefer to avoid that and might instead opt for using EvaluateJsonPath to extract all the fields I want to validate and then using RouteOnAttribute with the expression language to filter out bad records. If there is a more appropriate way to validate records like this then I'm all ears. Thanks I'm advance!

Green_ · ‎10-02-2021

@yashratan Is it possible your nifi is configured to run the embedded zookeeper despite your trying to connect to your own zookeepers? Check if the nifi.state.management.embedded.zookeeper.start property in your nifi.properties file is set to true. Check if you are able to communicate with all your zookeepers from each of your nodes. This definitely seems like an issue communicating with your ZKs.

Green_ · ‎09-08-2021

EDIT: @MattWho 's answer made it clear to me I slightly misunderstood the question. His suggestion of managing the entire process using a script is definitely the way to go and would perfectly fit your use-case of someone creating a new instance of an existing process group. If I may add, it sounds like using the nifi registry might benefit you. You could upload your base process group to the registry and use version control with it. Then, when creating new copies, you would instead 'pull' the same process group from the registry instead of creating a copy of process group that exists in your canvas and might unintentionally get changed or deleted. ---------------- I do not believe there is a built-in way to directly run an event when a process group is started/stopped, but a reasonable workaround would be to monitor nifi's app-logs and trigger your own event when you receive a log that mentions starting your PG. If you only want to use tools offered by nifi, you could use the TailFile processor and configure it to run over your app-logs file, then use another processor (such as RouteOnContent) to match the log for starting the PG. From there you can try and do whatever it is that starts the administrative tasks (such as sending an http request with InvokeHTTP processor) If this answer helped, please mark it as 'solved' and/or apply 'kudos' 🙂

Green_ · ‎09-06-2021

It's a bit hard to imagine your flow just from the description, but I think I understood it. What other questions do you have about it? In my opinion it doesn't sound too great adding an attribute to every flowfile after it is written to the DB, only to then write it to a cache which control m will query (if I understood correctly). If your only requirement is to know whether all the files were successfully written to your DB, you should simply ignore files which were successfully inserted and only apply some logic when an insert fails. Perhaps if a file fails you can write it to someplace else so you will be able to investigate why it failed (some place more persistent than a cache). If you just want to be alerted when an insert fails / want to return a response to control m, just add an invokehttp processor after the failure relationship from your putDB processor (if I correctly understood that control m expects http calls). Because nifi is stream oriented, it's hard to tell exactly when a batch of files has finished writing to your DB unless you know exactly how many records should be written (and then counting the flowfiles passed to success is actually reasonable).

Green_ · ‎09-04-2021

In general, Nifi is not very well suited for event-based processing (E.G. an external scheduling tool pinging nifi to start a process group run). I do not know how Control M works, but what you're describing sounds like it could be achieved with Nifi's REST API (you can directly start/stop a specific process group by its ID). The requirement for checking if everything got inserted to your database is also quite hard to accomplish accurately. You could use the REST API once more to check your process group has no queued files (which would mean all your flowfiles successfully passed through the flow), though you'll also have to think about what should happen if writing to the DB fails. I don't believe there is any great way to check if your scheduled run 'completed', but you could definitely use some other processor to 'notify' yourself if something failed. If this answer helped, please mark it as 'solved' and/or apply 'kudos' 🙂.

Green_ · ‎09-04-2021

A bit late to the party, but do you (or anyone else who might have encountered this problem) have any extra info to share about it? I am currently experiencing a similar issue.

Green_ · ‎08-10-2021

@hegdemahendra I have found this article by Pierre V. Where he goes into deeper detail about the logback.xml file. He mentions something that might be relevant to what you're looking for - the following two passages are what caught my eye: ``` "We can also define new appenders in the log configuration file and change it according to our needs. In particular, we could be interested by the SMTP Appender that can send logs via emails based on quite a large set of conditions. Full documentation here." "Obviously you can also configure this configuration file so that NiFi log files integrate with your existing systems. An idea could be to configure a Syslog appender to also redirect the logs to an external system." ``` I myself have never done something like this, but it sounds like a step in the right direction for directly writing logs from nifi to mongo.

Online	Offline
Last Visited	‎09-28-2023 09:40 AM

Member Since	‎08-01-2021 07:36 AM
Last Visited	‎09-28-2023 09:40 AM
Posts	48
Kudos received	10

Cloudera Community

Re: How to delete a variable from nifi variable re...

Re: PutDatabaseRecord truncates microseconds from ...

Re: nifi-replace file headers using replacetext pr...

Re: Get only element based on a single parameter

Re: Update attributes of json content using rules

Re: nifi 1.11.4 ui responding slow

SiteToSiteStatusReportingTask aggregate statuses f...

MergeRecord not merging

Validating JSON schema with max length and enum

Re: Nifi on 3-node Docker cluster

Re: Execute a process when a ProcessGroup starts (...

Re: Nifi process group scheduling via control m

Re: Nifi process group scheduling via control m

Re: Why does NiFi performance slow down?

Re: What is best way to read nifi logs and put in...