About MattWho

MattWho · ‎01-26-2024

@plapla Since Apache NiFi does not have a ConsumeKafka processor build with the Kafka 2.6 client, i would recommend going with the client closest to but not newer than the Kafka server version you are using. In this case the ConsumeKafka_2_0 processor. 1. Is your NiFi a standalone NiFi instance install or a multi-node NiFi cluster setup? 2. How are your distributedMapCacheClient and DistributedMapCacheServer controller services configured? 3. What is the rate of FlowFiles being produced by your consumeKafka processor? I see that you configured the DetectDuplicate processor with an age off of 420 days; however, DistributedMapCache server has a configured max cache entries with a default of 10,000. So possibly due to volume cache entries are being removed from the cache server resulting in issues detecting duplicates. The DistributedMapCache also holds all cache entries in NiFi heap memory and is not a good cache server to use in high volume caches (because of heap usage). DistributedMapCache also offers no high availability. This is becomes even more of an issue with a NiFi cluster. You would be better off using an external map cache server. If you are using a NiFi cluster, make sure your DistributedMapCacheClient is configured to connect to one specific DistirbutedMapCacheServer. I have seen misuse here where individuals configured it to connect to local host or each NiFi node to its own host's map cache server. The Map cache servers running on each node do not share data. Hope this helps you... If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-24-2024

@sukanta I am not completely clear on what you are asking for here. Can you provide more detail? What do you mean by "custom error page in NiFi"? Are you referring to the NiFi bulletin Board? Only users who are authorized for the component producing the bulletin message woudl be able to view the bulletin text via the component or bulletin board. Thanks, Matt

MattWho · ‎01-23-2024

@plapla Couple things here: 1. You should be using the ConsumeKafka processor that matches your Kafka server version. If your KafKa server is 2.6, you should be using the ConsumeKafka_2_6 processor. 2. In your detectDuplicate processor you are using ${key}. How is this Attribute beig created on each FlowFile and where is it's value derived from? Thanks, Matt

MattWho · ‎01-19-2024

Big shout out to all the amazing contributors to this community both via question solution assistance and valuable articles!!! Together we enable great people to accomplish great things.

MattWho · ‎01-19-2024

@Sartha You are still not using the correct URL in your postHTTP as I stated in last response: It needs to be based on what you shared in yoru last response: http://10.73.121.84:5026/contentListner The ListenHTTP processor defines the "Base Path" on which it is listening for inbound connections. The default value is "contentListener". So the PostHTTP needs to post to that Base Path. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-19-2024

@Dave0x1 That is a big jump in versions from 1.13 directly to 1.24. Use NiFi toolkit instead to change the algorithm. https://nifi.apache.org/download/ NiFi Toolkit 1.24.0 ./encrypt-config.sh -n <nifi.properties from original 1.13 NiFi> -f <flow.xml.gz from original 1.13 NiFi> -x -s <sensitive props key from NiFi> -b <bootstrap.conf from original 1.13 NiFi> -A NIFI_PBKDF2_AES_GCM_256 -g <new 1.24 flow.xml.gz filename> Then in your NiFi 1.24 remove or rename the current flow.xml.gz and flow.json.gz files. Place the flow.xml.gz output from above toolkit command into same location and make sure permissions and ownership are correct. Start your NiFi 1.24. Since the flow.json.gz does not exist, NiFi will load the flow.xml.gz and upon successful startup generate the new flow.json.gz file it will load from that point forward each time NiFi is restarted. Hope this works for you. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Mat

MattWho · ‎01-18-2024

@Adhitya This is a rather old post. Can your provide details on your specific setup (processors and configurations including scheduling) used, info around your data and what you expect versus what you are seeing? Is your NiFi a cluster setup or standalone? How is source data ingested into your NiFi for this dataflow? Typically issues like this are related to dataflow design, but there is not enough info here to reproduce or make suggestions yet. Thanks, Matt

MattWho · ‎01-18-2024

@Alexy MaxHistory is relative to timedBased Policy date pattern: in the example shown in this thread you mentioned: https://mkyong.com/logging/logback-xml-example/ The full rollingPolicy example is: <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>logs/archived/app.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>  <maxFileSize>10MB</maxFileSize>  <totalSizeCap>20GB</totalSizeCap>  <maxHistory>60</maxHistory> </rollingPolicy> Within the fileNamePattern we see the configured date pattern "%d{yyyy-MM-dd}" which is based on days. So, maxHistory of 60 means in this specific example to retain "60" days. Now if that date pattern instead was "%d{yyyy-MM-dd_HH}" which is based on hours instead, the same maxHistory=60 would now mean 60 hours of logs. But since this policy also archives on size, the fileNamePattern also includes "%i" which roles the log incrementally each time it reaches the configured maxFileSize= 10MB. So it will keep 60 days of logs; however, each of those 60 days could have 1 too many rolled logs depending on how much logging is output. app.2024-01-18.1.log.gz app.2024-01-18.2.log.gz app.2024-01-18.3.log.gz app.2024-01-18.4.log.gz app.2024-01-18.5.log.gz ... So if your application decides to unexpectedly start producing large amounts of log output, you could quickly consume all your disk space with incremental logs. This is where the TotalSizeCap setting comes into the picture. It's job is to say regardless of number of logs you want to keep (60 days in this specific configuration example), start deleting old archives when total log archives exceeds this configured size cap. maxHistory is relative to the configured date pattern and is always days. So comment in above exact example is correct. it is not about number of log files and since pattern is based in days, it is correct. A better generic comment is:  since it makes no specific relative to the dat pattern being used. It simply states that the maxHistory value correlates to number of log archives that are retained. What that equates (minutes, hours, days, etc) to depends on the user defined data pattern. I modified the example in the article you liked that I created with above comment instead of days even though I called it out later in my article for clarity. In my opinion, TotalSizeCap is something you should use anytime you use the SizeAndTimeBasedRollingPolicy. That is because maxHistory setting does not take into account the incrementally (%i) created log files. So using above example where it is creating daily logs (%d{yyyy-MM-dd}) the maxHistory= 60 says to keep 60 days of logs. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-18-2024

@MPHSpeed Working with the actual data instead of sample data I build, i would recommend making these two changes: 1. In extractText processor change "Enable Unix Lines Mode" to true. 2. In each dynamic property in RouteOnAttribute, change "equals" function to "contains" function. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-18-2024

@Sartha localhost means same host on which the service is running, so you can't use localhost in a postHTTP processor running on MiNiFi (host B) and expect it to sent FlowFiles to ListenHTTP running on another host (host A). The postHTTP is going to transmit FlowFiles to another host, so why would you configure postHTTP running on your MiNiFi Host with the MiNiFi IP and port? in the PostHTTP url used is server hostname and port number because MiNiFi is running on this server Maybe I am just misunderstanding what you are saying in above statement. It also does not appear that the URL is correct. It appears to be missing the "Base path" (default contentListener) as described in the processors documentation. To help you out, I build this flow so I can share screenshots of how it is configured for you: Here is PostHTTP configuration which would be used on MiNiFi (host B): Here is configuration of ListenHTTP used on the NiFi instance (host A) that will receive FlowFiles from above PostHTTP on your MiNiFi (host B): In my configuration I used port 9933, but you can use any unused non privileged port available on your NiFi instance (host A). I am also still confused by your screenshots with regards to below: Image 1 of the tailFile processor. This the processor that is feeding the PostHTTP processor on the MiNiFi instance installed on (host B), correct? You configured it to tail a nifi-app.log (which is not configured with the absolute path) which is generated by NiFi (host A). As I mentioned before, the tailFile processor can not tail a file located on a different host form where the tailFile is executing). Image 3 showing dataflow. Why is ListenHTTP success relationship connected to PostHTTP? If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎01-13-2026 11:14 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-13-2026 11:14 AM
Posts	3,421
Kudos received	1620

Cloudera Community

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Nifi consume duplicate message problem from Ka...

Re: How to create custom error page for apache nif...

Re: Nifi consume duplicate message problem from Ka...

Re: Celebrating 100,000 Members: A Journey of Grow...

Re: Unable to read remote file in nifi by using mi...

Re: javax.crypto.AEADBadTagException: Tag mismatch...

Re: FetchDistributedMapCache Retrieving Old Token ...

Re: Nifi Log Rotation

Re: How do I search through FlowFiles and pull out...

Re: Unable to read remote file in nifi by using mi...