Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 163 | 06-03-2026 06:06 PM | |
| 461 | 05-06-2026 09:16 AM | |
| 832 | 05-04-2026 05:20 AM | |
| 499 | 05-01-2026 10:15 AM | |
| 626 | 03-23-2026 05:44 AM |
01-19-2023
07:55 AM
I was not able to successfully launch a basic cluster. I closed this ticket because I did not see any followon.
... View more
01-19-2023
07:35 AM
Hello, Thank you Matt for your quick response (and sorry for my late response). It work perfectly. Best Regards
... View more
01-17-2023
01:23 PM
@srilakshmi Logging does not happen at the process group level. Processors logging is based on the processor class. So there is nothing in the log output produced by a processor within a process group that is going to tell you in which process group that particular processor belongs. That being said, you may be able to prefix every processor's name within the same Process group with some string that identifies the process group. This processor name would generally be included in the the log output produced by the processor. Then you may be able to use logback filters (have not tried this myself) to filter log output based on these unique strings. https://logback.qos.ch/manual/filters.html NiFi bulletins (bulletins are log output to the NiFi UI and have a rolling 5 minute life in the UI) however do include details about the parent Process Group in which the component generating the bulletin resides. You could build a dataflow in yoru NiFi to handle bulletin notification through the use of the SiteToSiteBulletinReportingTask which is used to send bulletin to a destination remote import port on a target NiFi. A dataflow on the target NiFi could be built to parse the received bulletin records by the bulletinGroupName json path property so that all records from same PG are kept together. These 'like' records could then be written out to local filesystem based on the PG name, remote system, used to send email notifications, etc... Example of what a Bulletin sent using the SiteToSiteBulletinReportingTask looks like: {
"objectId" : "541dbd22-aa4b-4a1a-ad58-5d9a0b730e42",
"platform" : "nifi",
"bulletinId" : 2200,
"bulletinCategory" : "Log Message",
"bulletinGroupId" : "7e7ad459-0185-1000-ffff-ffff9e0b1503",
"bulletinGroupName" : "PG2-Bulletin",
"bulletinGroupPath" : "NiFi Flow / Matt's PG / PG2-Bulletin",
"bulletinLevel" : "DEBUG",
"bulletinMessage" : "UpdateAttribute[id=8c5b3806-9c3a-155b-ba15-260075ce9a6f] Updated attributes for StandardFlowFileRecord[uuid=1b0cb23a-75d8-4493-ba82-c6ea5c7d1ce3,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1672661850924-5, container=default, section=5], offset=969194, length=1024],offset=0,name=bulletin-${nextInt()).txt,size=1024]; transferring to 'success'",
"bulletinNodeId" : "e75bf99f-095c-4672-be53-bb5510b3eb5c",
"bulletinSourceId" : "8c5b3806-9c3a-155b-ba15-260075ce9a6f",
"bulletinSourceName" : "PG1-UpdateAttribute",
"bulletinSourceType" : "PROCESSOR",
"bulletinTimestamp" : "2023-01-04T20:38:27.776Z"
} In the above produced bulletin json you see the BulletinGroupName and the BulletinMessage (the actual log output). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-12-2023
01:30 PM
@SachinMehndirat There is NO replication of data from the four NiFi repositories across all NiFi nodes in a NiFi cluster. Each NiFi node in the cluster is only aware of and only excutes against the FlowFile on that specific node. As such, NiFi nodes can not share a common set of repositories. Each node must have their own repositories and it is important to protect those repositories from data loss (flowfile_repository and content_repository being most important). - flowfile_repository - contain metadata/attributes about FlowFiles actively processing thorugh your NiFi dataflow(s). This includes metadata on location of content of queued FlowFiles. - content_repository - contains content claims that can hold the content for 1 too many FlowFiles actively being processed or temporarily archived post termination at end of dataflow(s) - provenance_repository - contains historical lineage information about FlowFile currently or previously processed through your NiFi dataflows. - database_repository - contains flow configuration history which is a record of changes made via NiFi UI (adding, modifying, deleting, stopping, starting, etc...). Also contain info about users currently authenticated in to the NiFi node. Processors that record cluster wide state would use zookeeper to store and retrieve that stored state needed by all nodes. Processors that use local state will write that state to NiFi locally configured state directory. So in addition to protect the repositories mentioned above from dataloss, you'll also want to make sure local state (unique to each node in the NiFi cluster) directory is also protected. The embedded documentation in NiFi for each component has a section "State management:" that will tell you if that component use local and/or cluster state. You may find some of the info found in the following articles useful: https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-12-2023
01:08 PM
1 Kudo
@AndreyN Each processor by default uses Timer Driven Scheduling strategy and a run Schedule of 0 secs. This means that each processor is constantly requesting threads from the Max Timer Driven Thread pool and checking for work (work being any FlowFiles on inbound connections or in case of an ingest processor, connecting to that ingest point whether local dir or remote service to check for data). While generally these checks for work take micro seconds or longer depending on processor, NiFi does have a global setting for yielding the processors when the previous run resulted in no FlowFiles processed/produced. To prevent excessive latency this back duration by default is very short (10 ms). To adjust this setting, you can change the following property in the nifi.properties file: nifi.bored.yield.duration found in Core Properties section of admin guide. Keep in mind that this setting impacts all processors. So higher you set the more latency that could exist before new work is found after a run that resulted in no work. You can also selectively adjust the run schedule on select processors. 0 sec run schedule means to run as fast as possible. So as soon as one thread completes, request another (unless thread results in no work wait bored duration before requesting next thread). So if you have flows that are always very light and latency is not a concern, you could set those processors to only get scheduled to execute every 1 sec or longer. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-10-2023
09:19 PM
@RodolfoE, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
12-30-2022
02:16 AM
Hi @MattWho , After i Enabled Retrieve site-to-site details for the Input port BulleInfo, then i was able to receive the BulletinInfo. Thank you so much for you suggestions!! Regards, Pradeep
... View more
12-21-2022
12:25 PM
@samrathal 1. What is the purpose of the SplitJson in your dataflow? 2. If you have 1 FlowFile with 1000 records in it, why use SplitJson to split that in to 1000 FlowFiles having 1 record each? Why not just merge the larger FlowFiles with multiple records in it? Or am i missing part of the use case here? --- Can you share a template of flow definition of yoru dataflow? 1. It is not clear to me how you get "X-Total-Count" and how you are adding this FlowFile attribute to every FlowFile. 2. You have configured the "Release Signal Identifier" with a boolean NiFi Expression Language (NEL) that using your example will return "false" until "fragment.count" FlowFile attribute value equals the FlowFile attribute "X-Total-Count" value. 2a. I assume you are writing "X-Total-Count" to every FlowFile coming out of the SplitJson? How are incrementing the "fragment.count" across all FlowFile in the complete 5600 record batch. Each FlowFile that splits into 1000 FlowFiles via splitJson will have fragment.count set to 1 - 1000. So fragment.count would never reach 5600 unless you are handling this count somewhere else in your dataflow. 2b. If a FlowFile where value from "fragment.count" actually equals value from "X-Total-Count" attribute, your "Release Signal Identifier" will resolve to "true". The ""Release Signal Identifier" value (true or false) in your configuration is looked up in the configured "distributed map cache server. So where in your dataflow to you write the release signal to the distributed map cache? (usually handled by a notify processor) I am in no way implying that what you are trying to accomplish can't be done. However, coming up with an end-to-end workable solution requires knowing all the steps in the use case along the way. I would recommend going through the example Wait/Notify linked in my original response to get a better understanding of how wait and notify processors work together. Then maybe you can makes some changes to your existing dataflow implementation. With more use case details (detailed process steps) I could suggest further changes if needed. I really hope this helps you get some traction on your use case here. If you have a contract with Cloudera, you can reach out to your account owner who could help arrange for professional services that can work with your to solution your use cases in to workable NiFi dataflows. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more