About MattWho

MattWho · ‎11-22-2022

@zack_riesland Yes, "0 0 18 * * ?" means schedule to run at 0 secs, 0 mins, 18 hour, every day, every month, any day of week" What's important to understand is the quartz cron is used to schedule the processor to execute. In order for the processor to execute at exactly 18:00:00, NiFi must have an available thread in the NiFi timer driven thread pool in order to execute. If a thread is not available, the processors code will execute as soon as a thread become available. Since it has been "scheduled" it will run as soon as thread becomes available and then will get scheduled again the next day at 18:00:00. Matt

MattWho · ‎11-15-2022

@Jacccs An example or detailed description of yoru use case may be helpful in providing the bets guidance for you. While the NiFi Expression Language (NEL) function anyMatchingAttribute expects a java regular expression that searches and returns values for multiple FlowFile attributes, that does not appear to be what you need??? Your attribute "attributeToSearch" implies only a single specific FlowFile attribute is desire to be checked if it contains some "${value}". If this is correct, you would be able to use the following NEL: ${literal("${${attributeToSearch}}"):contains('${value}')} For above NEL, let's assume a FlowFile with attribute "attributeToSearch" set to "username". A FlowFile attribute "username" set to "admin-matt". A FlowFile attribute "value" set to "admin". The result of above NEL statement would be true ${$attributeToSearch}} would first resolve to ${username} which would then resolve to "admin-matt". That "admin-matt" string would be then passed to the NEL contains function which will to check to see if that string contains the string "admin" within it. The result is a boolean "true" or "false". If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎11-08-2022

@hegdemahendra This is not something I have tried before, but... When you execute the nifi.sh script to start it bootstrap the NiFi process via the configuration in the bootstrap.conf NiFi configuration file. It is during the bootstrap process that NiFi starts the main child process that loads NiFi. Perhaps you can add additional java.args to handle your pre NiFi needs? Or maybe modify the the nifi.sh script itself so that is executes your requirements before calling the rest of the NiFi startup process. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎11-08-2022

@D5ha Your issue is a mutual TLS handshake issue and really has nothing specific to do with NiFi itself. There are a lot of resources on the web for creating certificates. There are even free services like Tinycert you can use to generate valid certificate meeting the requirements I shared in my last response. Providing guidance on how to create certificates does not make much sense since it can be done so many ways: - Self-signed - public CA - Corporate/private CA etc. Your current shared TLS exception is telling you that the IP or Hostname (you have BLUE line through it in yoru image) was not found as a Subject Alternative Name (SAN) in the certificate created for the server side of this handshake which in yoru case happens to also be your NiFi instance. The Site-To-Site-Bulletin-Reporting-Task is acting as the client in this Mutual TLS handshake and the NiFi server S2S destination URL is the server side of this TLS handshake. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎11-08-2022

@Bridewin I two things you may want to try.... 1. GetFile processor was deprecated in favor of the newer ListFile --> FetchFile processors. I'd recommend switching to these processors and see if you have the same observations. 2. I'd suggest enabling debug logging for the GetFile processor class to see what additional logging may show. To do this, you would modify the logback.xml file in NiFi's conf directory. Add the below line down in this file where you see similar lines already. <logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/> If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎11-08-2022

@Jagapriyan I suspect issue with last modified timestamps since missed files have older last modified timestamp than what was already consumed from the target directory that is compounded by the sub-directory structure. My recommendation is switch to using the listing strategy "Tracking Entities" instead. Tracking Entities will keep track of filenames and timestamps so even an older timestamped file will get consumed if its filename is not in the tracked entities list stored in the distributed cache. Let me know is making this change resolves yoru issue. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎11-02-2022

@Bridewin To add some additional context around your Cron schedule. NiFi uses Quartz Cron in case you were not already aware. Your current Quartz cron "0 05 8 1/1 * ? *" means that the processor will be scheduled to execute at 8:05am starting day 1 of every month and every subsequent day after day 1 in each month. The issue with this cron is when you start your GetFile on any day other than the 1st prior to 8:05am. Let's say you start NiFi on November 3rd. On startup NiFi loads your flow and starts all your component processors. In this configuration your GetFile will not get scheduled until December 1st and then at that point continue to execute everyday there after. If you stop and start the processor even without a NiFi restart, the same would happen. If NiFi restarts the JVM, same will happen. I am not clear on why you decided to add 1/1, perhaps this is how you intended for it to be scheduled? To truly have it get scheduled at 8:05am everyday starting the very day the processor is started (whether via user action or NiFi JVM restart), you would want a cron like "0 5 8 * * ? *" For more info on QuartZ Cron, review this link: https://productresources.collibra.com/docs/collibra/latest/Content/Cron/co_quartz-cron-syntax.htm If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎11-01-2022

@Jagapriyan Since you are using the Listing Strategy "Tracking Timestamps", the configuration property "Entity Tracking Time Window" is not used. The "Tracking Timestamps" strategy is very dependent on timestamps of the target files. Typically when files are not being picked up it is because the timestamps on those files are equal to or less than the last recorded timestamp in the ListSFTP processors state. This can happen when files in the SFTP server target folders do not have their last modified timestamp updated (for example moving a file from another directory into a SFTP server directory. A copy would update the timestamp since the file is being written again). - Does your target SFTP path have multiple sub-directories which are being searched? Is Search Recursively set to "true"? - Are there symlink directories in use? - Have you looked the the state recorded timestamp for your SFTP server directories? Are your missed files having older timestamps? - How many files average are being written to the target SFTP between 12am and 1am each day? I also see you have min file age of 5 minutes. This means the last Modified timestamp must be 5 minutes older than the execution time of your processor for the file to be eligible for consumption. I see you stated your files are placed in the SFTP server between 12am - 1am each day and you scheduled your ListSFTP processor using a cron schedule at 10 minutes and 1 second every hour between 2am and 2pm. Why not just have your listSFTP processor run all the time? Is this because timestamps are not being updated consistently? If you switch to using the listing strategy "Tracking Entities" instead, do you still see the issue? Tracking entities works when there is issues with timestamps and was developed for that reason. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎11-01-2022

@Bridewin Are all your environments using NAS storage location from which the GetFile is pulling files? Have you monitored the health and connectivity of your NAS. Since you have your GetFile only scheduled to execute once a day, if your NAS or network is having issues, it simply will return nothing for that days execution. Since you are configured to remove the file you are consuming, have you tried to change yoru cron to run multiple times within the 8am hour to see if it gets picked up by any one of those executions? Perhaps if you are having network issues occasionally impacting your NAS, this will resolve your issue with consuming the file. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎10-28-2022

@D5ha Not all processors write to the content repository nor is content of a FlowFile ever modified in the content after it is created. Once a FlowFile is created in NiFi it exists as is until terminated. A NiFi FlowFile consists of two parts, FlowFile Attributes (metatadata about the FlowFile which includes details about the FlowFIle's content location in the content_repository) and the FlowFile content itself. When a downstream processor modifies the content of a FlowFile, what is really happening is a new content is written to a new content claim in the content_repository, the original content still remains unchanged. From what you shared, you appear to have just one content_repository. Within that single content_repository, NiFi creates a bunch of sub-directories. NiFi does this because of the massive number of content claims a user's dataflow(s) may hold for better indexing and seeking. What is very important to also understand is that a content claim in the content_repository can hold the content for 1 or more FlowFiles. It is not always one content claim per FlowFiles content. It is also very possible to have multiple queued FlowFiles pointing to the exact same content claim and offset (same exact content). This happens when you dataflow clones a FlowFile (for example routing same outbound relationship from a processor multiple times). So you should never manually delete claims from any content repository as you may delete content for multiple FlowFiles. That being said, you can use data provenance to locate the content_repository (container), subdirectory (section), Content Claim filename(Identifier), Content offset byte where content begins in that claim (Offset), and number of bytes from offset to end of content in the claim (Size). Right click on a processor and select "view data provenance" from displayed context menu: This will list all FlowFiles for which provenance still holds index data on that were processed by this processor: Click the Show Lineage icon (looks like 3 connected circles) to the far right of a FlowFile. You can right click on "clone" and "join" events to find/expand any parent flowfiles in the lineage (the event dot created for the processor on which you said show provenance will be colored red in the lineage graph): Each white circle is a different FlowFile. clicking on a white circle will highlight dataflow path for that FlowFile. Right clicking on an event like "create" and selecting "view details" will tell you all about what is known about that FlowFile (this includes a tab about the "content"): Container corresponds to the following property in the nifi.properties file: nifi.content.repository.directory.default= Section corresponds to subdirectory within the above content repository path. Identifier is the content claim filename. Offset is the byte on which content for this FlowFile begins within that identifier. Size is number of bytes of you reach end of content for that FlowFile's content in the Identifier. I also created an article on how to index the Content Identifier. Indexing a field allows you to locate a content claim and the search for it in your data provenance to find all FlowFile(s) that pointed at it. You can then look view the details of all those FlowFile(s) to see full content calim details as above: https://community.cloudera.com/t5/Community-Articles/How-to-determine-which-FlowFiles-are-associated-to-the-same/ta-p/249185 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

Online	Offline
Last Visited	‎11-18-2025 07:56 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-18-2025 07:56 AM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Helping setting up cron-based nifi processor

Re: Is there a way for anyMatchingAttribute to wor...

Re: How to execute a java class during nifi bootup...

Re: Configure SiteToSiteBulletinReportingTask in N...

Re: NiFi GetFile processor is not reading file on ...

Re: NiFi: Cron Schedule not working as expected

Re: NiFi GetFile processor is not reading file on ...

Re: NiFi: Cron Schedule not working as expected

Re: NiFi GetFile processor is not reading file on ...

Re: Is there any way to identify content storage l...