About MattWho

MattWho · ‎03-12-2025

@NaveenSagar I think I know your issue. You can encounter unexpected result when you do you use the correct simpleDateFormat. You are using below: ${now():format('YYYYMMddHHmmss')} It should be: ${now():format('yyyyMMddHHmmss')} NiFi uses Java Date patterns: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html "y" is used for year "Y" is used for Week year Fixing your date format should prevent you from seeing issue in the future. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-11-2025

@NaveenSagar I was not able to reproduce in my NiFi (1.26 based). All I can figure is that some changes occurred that impacted the dates in those three days and not afterwards or perhaps the timestamps produced by the ${now():format('YYYYMMddHHmmss')} were manipulated downstream in your dataflow by another processor? Thanks, Matt

MattWho · ‎03-11-2025

@Darryl So the downstream system that putUDP is sending to is complaining the size of the datagram is too large when batch size is set to 50. When using batching in ListenUDP processor each datagram is delimited by a new line character. You could add a processor like splitText between ListenUDP and PutUDP to split these FlowFiles into smaller FlowFiles before sending to putUDP. Since a batch size of 30 seemed to work well for you, I would try increasing Batch setting in ListenUDP to 60 and set the "lineSplit Count" in Split text to 30. As far as "When i tried bumping up the concurrent threads from 1 to 2, it caused the video to be extremely blurry", I am guessing maybe the multi-threading is resulting the packets out of order resulting in your extremely blurry video? If that is the case, you'll need to keep your flow single treaded. And if order of processing FlowFiles is important downstream connections should be configured with FirstInFirstOutPrioritizer prioritizer. This does not mean you can then use multiple threads, but makes sure downstream processors take FlowFiles from upstream connections in this order. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-11-2025

@vg27 You plan on retaining a lot of provenance data? I don't know you daily expected volumes/sizes, but 100GB for NiFi Content Repository seems a bit small. Since each node runs its own copy of the flow.json.gz and has its own repositories, you can't replicate the repositories between nodes. In your scenario the primary node change happens when a restart of your NiFi cluster occurs, but in reality a primary node change can happen at other times as well. The cluster coordinator has nothing to do with which node is running the primary node only scheduled processor components. I am also trying to understand why have a NiFi cluster setup if you only ever intent to have the primary node do all the work? I really don't follow your use case here? Your plan is to ingest data into NiFi's primary node and hold it within the Dataflows built on the NiFi canvas? How do you plan to do that (have it all queue up in some connection until someone starts the downstream processor)? When NiFi is started, it loads the flow.json.gz into NiFi heap memory, It loads the flowFile repository local to the node into heap memory (except any swap files), and each node continues processing those FlowFiles through the dataflows. So a change to which node is elected the primary node has not impact on above. A change in elected primary node only impacts the few processors (only those with no inbound connection can be configured for primary node only scheduling) that are configured for primary node only scheduling. So lets say node1 is current primary node and has ingested data in to FlowFiles that are now queued in a connection. Then some event occurs that results in a node2 now being elected as primary node. All the FlowFiles originally ingested by node1 are still on node 1 and continue to be processed through the dataflows on node1. Node 2 is now the primary node and thus the only node now scheduling the "primary node" scheduled processors which are now be processed through the dataflows on node2. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-11-2025

@AlokVenugopal The "/nifi-api/access/token" endpoint is used by the ldap-provider or kerberos-provider NiFi login providers. Since you are using an OIDC (SSO) provider, you would need to obtain the token from that provider. You can utilize the developer tools available in most browsers to capture the rest-api call being made to NiFi from you browser. While I do not have a setup utilizing Azure AD, I'd expect you should be able to see the redirect to the Azure endpoint to get the token. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-07-2025

@Darryl Glad to hear you got your network setting worked out so that your multicast traffic was reaching the network interfaces where the ListenUDP processor was listening for the traffic. UDP protocol has no guaranteed delivery. It is fast but is not a fault tolerant. Are you seeing the downstream connection from ListenUDP processor filling to the point back pressure is being applied to ListenUDP processor (by default back pressure gets applied once a connection queue hits 10,000 FlowFiles)? Effectively how the putUDP processor works is that incoming messages are added to an internal message queue (default "Max Size of Message Queue" is 10000). The processor then reads messages from the message queue and created FlowFiles based on the scheduling interval, thread availability, and "Max Batch Size" setting. If the Message Queue is full UDP packets will get dropped/lost. Things you can try: Make adjustments to avoid downstream backpressure being applied to listenUDP. Increasing backpressure threshold is not very helpful unless you have set you batch size very high. You'll want to look at ways to increase the rate at which downstream NiFi processor can process incoming FlowFiles. If CPU load average on your NiFi server is healthy, you can increase concurrent tasks on the downstream processor so you have multiple concurrent threads processing FlowFiles concurrently. You already adjusted the batch size to 30 which results in multiple datagrams being added to a single NiFi FlowFile and it sounds like this change did not impact downstream processing. By adding more then one message to a single FlowFile, NiFi needs to create fewer FlowFiles with each thread execution that reads from the message queue. You could play around with increasing this to a even larger value. You can play around with Run duration. This is something you would try on downstream processor processors that are not keeping up with their inbound connection resulting in upstream back pressure being triggered. It creates some latency but allows a single thread to execute longer and work on multiple inbound FlowFiles in a single scheduled thread execution. You can play around with concurrent tasks setting on the UDP processor. Adjustments to concurrent tasks on ListenUDP will allow more then 1 thread to execute concurrently to consume messages from the message queue to generate FlowFiles to the downstream connection. To efficiently set concurrent tasks on processors, it is important to understand how they work, where the processor gets the threads from, and load on your system. NiFI sets a "Maximum Timer Driven Thread Count" value that create a pool of threads from which all NiFi processors will utilize for executing. The default value is 10. That means if you have 1000 processors on the canvas, they will be requesting to use one of these threads when they get scheduled. The thread pool helps in preventing overloading the CPU resources on the NiFi host. NiFi processors have a configurable "concurrent tasks" and "Run Schedule". The Run Schedule control how often this processor should ask the NiFi controller for a thread to execute. So assume 1 concurrent task and a run schedule of 0 secs. This processor will get schedules as fast as possible. Meaning as soon as it is started it will get scheduled and request a thread from the thread pool (There is a back of period if upon execution there is no work to do to prevent over CPU usage). So with 1 concurrent task the processor can't be scheduled again until the current task completes. When you change concurrent task to 2, the processor will scheduled 1st time and immediately schedule again hoping to get 2 threads from the thread pool to execute concurrently. So assuming CPU load average is not high, you can increase the size of the max timer driven thread pool and increase concurrent tasks on yoru processors. Always make small incremental changes and monitor impact on yoru CPU load average. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-06-2025

@NaveenSagar Typically when odd timestamp behavior like this is seen it is caused by some custom processor or script being executed within some NiFi dataflow that is timezone in Java or messing with java date which ends impacting the rest of NiFi. So I'd recommend looking at any custom processors or scripts you have being executed to see if any of them is manipulating the Java time. The impacts of doing such things will impact the JVM that all NiFi components are using. Restart would return things to normal until such script or custom processor was executed again. I can't think of anything else that would impact here. The now() function returns the current date and time. So not knowing exactly what was being manipulated and adjusted with regards to system time in the end of December 24, I have no other suggestions. Perhaps whomever was adjusting timezones made a mistake(s) and then corrected them. Are you a NIFi cluster or standalone single NiFi instance? If you are a NiFi cluster was same observation made on every node in the cluster or only impacted FlowFiles on one of the nodes. Apache NiFi 1.17 is ~3 years old. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-06-2025

@vg27 When it comes to NiFi's content, FlowFile, and provenance repositories, it is about performance. FlowFile repository contain the attributes/metadata for a FlowFile. This includes what content claim in the content repository and byte offset contains the content for a FlowFile. The contents of this repository typically remains relatively small. Usage is in direct correlation with number of FlowFiles actively queued in the NiFi UI and the size of the attributes on the FlowFile. So size can quickly grow if you build dataflows the extract content from the FlowFiles into FlowFile attributes. FlowFile attributes are read/written by every processor that touches the FlowFile. Content repository contain the content claims referenced by the FlowFiles. Each content claim can hold the content for 1 too many FlowFiles. A content claim is only moved to archive and then eligible for deletion ONLY once no FlowFiles reference any content in the claim. So a one byte FlowFile left queued in some connection on the NiFi UI can prevent a large content claim from being deleted. Content is only read by processor that need to read that content (some processor only need access to the FlowFiles metadata.attributes). Provenance repository hold events about the life of a FlowFiles through your NiFi dataflows from create to delete. NiFi can produce a lot of provenance events depending on FlowFile volume and number of NiFi processor components a FlowFile passes through. Since provenance events are not a required part of processor your FlowFiles, you have complete control over retention setting and how much disk space they can consume. Loss of this repo, does not result in any dataloss. Since all three of these repos have constant I/O using NFS storage or standard HDD would not be my first recommendation. (NFS storage relies on network I/O and Standard HDD probably are going to create a performance bottleneck for your data volumes). I am not that familiar with the performance characteristics of Azure blob storage to make a recommendation there. SSD are good choice, but make sure there is data protection for your content and FlowFile repositories. You don't want disk failure to result in data loss. I am not clear on this "Data expected to grow 3TB to 5TB." Is that per hour, per day, etc... Is it spread evenly over the day or comes at specific heavy times each day. Take this into consideration when selecting based on storage throughput performance. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-06-2025

@pavanshettyg5 What version of Apache NiFi are you using? The NiFi screenshot you shared implies authentication was successful, but you are having some form of authorization issue. The second screenshot you shared from the logs is not providing much useful information. What is observed in both the nifi-user.log and nifi-app.log when you attempt to access the NiFi UI? You mention that you are using "OIDC provider". So when you access NiFi are you getting to the login prompt where you provide your OIDC credentials? What is seen in the logs at this time and when you submit your credentials? Does your NiFi truststore contain the complete trust chain (all root and intermediate public certs used to sign the server certificate) for your OIDC endpoint? Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-06-2025

@NaveenSagar Welcome to the community. There is not enough information provided to investigate your issue. What version of Apache NiFi are you using? What processor(s) is producing the timestamps in question? What is the configuration of the processor(s)? Thank you, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,209
Kudos received	1585

Cloudera Community

Re: Nifi storage setup in Azure

Re: Securing Nifi with SSL and using OIDC provider...

Re: Import Json Definition to a target nifi system...

Re: Using MapCacheServer and MapCacheClientService...

Re: Nifi DatabaseTableSchemaRegistry - PutDatabase...

Re: Investigation of Future Timestamps Generated i...

Re: Investigation of Future Timestamps Generated i...

Re: Apache nifi listen to multicast messages witho...

Re: Nifi storage setup in Azure

Re: How to Nifi RestApi with Azure AD authenticati...

Re: Apache nifi listen to multicast messages witho...

Re: Investigation of Future Timestamps Generated i...

Re: Nifi storage setup in Azure

Re: Securing Nifi with SSL and using OIDC provider...

Re: Investigation of Future Timestamps Generated i...