About MattWho

MattWho · ‎05-11-2021

@dieden9 NiFi provides a number of Kafka processors based off the Kafka Client they are using. The original ConsumeKafka processor (no number) used the old Kafka 0.8 client. The 0.8 client processor does not offer the ability to specify a regex for the topic names. You should be using the Kafka client version processors that match the Kafka server version you are consuming from. From ConsumeKafka_0_10 [1] on, you have the ability to configure the processor to use "names" or "pattern" for the topic name(s). The pattern is a java regular expression. [1] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-1-0-nar/1.13.2/org.apache.nifi.processors.kafka.pubsub.ConsumeKafka_1_0/index.html If you found this help with your query, please take a moment to login and click accept on this solution. Thank you, Matt

MattWho · ‎05-10-2021

@Nickanor It would be interesting to see a verbose listing of your NiFi logs directory once it has well exceeded 50 GB archived log files. What you have configured will retain 30 hours of log data. With each of those 30 hours you may have 1 or more incremental log files (each at 100 MB except for last one each hour). On NiFi restart do you see that the following is cleaning up the archive directory of files older than 30 hours: <cleanHistoryOnStart>true</cleanHistoryOnStart> I would be inspecting the nifi-app.log to see if you encounter any exceptions around logback or if you see any OutOfMemory (OOM) or no more files (file limits) exceptions that may explain the behavior. Hope this helps, Matt

MattWho · ‎05-10-2021

@Allen123 Based on your description, the ListSftp processor is working exactly as designed and configured. You have told it to list file from both directories and have also told it to search for both filenames in those directories. There is no way to configure the processor to ignore the extra unwanted a.text or b.text files. You have two options: Option 1: - Create to listSFTP processor with each configured to list within a unique directory and "Search Recursively" set to false. Then set a "File Filter Regex" so it only lists the desired file from each directory. - The feed the success relationship from each of those two ListSftp processors to a single FetchSftp processor. Option2: The ListSftp processor is designed to only list files and not actually consume any content. The content of the files listed is consumed later using the FetchSftp processor. Between the ListSftp and the FetchSftp processors you could insert a RouteOnAttribute [1] processor that you could configure to only pass the two specific files you want on to FetchSftp while auto-terminating the others. You would route the matched relationship via a connection to the FetchSFTP and "auto-terminate" the unmatched relationship. [1] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.RouteOnAttribute/index.html If you find that this has helped with your query, please take a moment to login and click accept on the solution. Thanks, Matt

MattWho · ‎05-10-2021

@kkau The ERROR you see is because the PrometheusRecordSink controller services when started binds to the configured port as a listener which allows Prometheus server to connect to that port for the purpose fo scraping data from NiFi. On a single host, only one service can bind to a port. So any subsequent component on that server that attempts to bind to that same port will fail with "address already in use". I don't know much of anything about Prometheus server, but if you can configure Prometheus to scrape from a unique port on each NiFi instance then you should be able use the variable registry to define a unique port number per host. By using NiFi Expression Language (EL) you would still have matching flow.xmls on all your NiFi nodes. You would simply have a unique variable registry file configured for each instance in that instance's nifi.properties file. So for your nodes you would have three files: <path to file>/node1-variables <path to file>/node2-variables <path to file>/node3-variables In inside each of those files you would have the same property but each with unique value PrometheusPort=9092 (written to node1-variables file) PrometheusPort=9093 (written to node2-variables file) PrometheusPort=9094 (written to node3-variables file) Then when you start this reporting task, each node binds to a different port number. Then you have your Prometheus server scrape from all three unique endpoints. But I do agree with @TimothySpann that having multiple NiFi hosts run on the same server is not a recommended setup. If you found that any of the provided solutions helped with your query, please take a moment to login and click accept on the solution(s). Thanks, Matt

MattWho · ‎05-07-2021

@leandrolinof The "EvaluateJsonPath" processor you shared has the configured destination as a FlowFile-Attribute and thus leaves the content of the FlowFile unchanged. So if FlowFile Attribute is where you want this parsed output to reside, you could use "ExtractText" [1] as an alternative solution. The used value here would be: .*Erros":\[(.*)\].* This has only 1 capture group which is for the content you are trying to extract. That content is then added to the FlowFile in a new FlowFile attribute based on the property name. If you instead want to replace the content of your FlowFile with only the portion of the original content you are trying to extract, you could use the "ReplaceText" [2] processor. The used Java regex "Search value" here would be: (.*Erros":\[)(.*)(\].*$) Which breaks source content into 3 capture groups so regex matches entire content and capture group 2 matches on the string output you are looking for. So "Replacement Value" is simply set to "$2" so that entire content is replaced with just contents of capture group 2. [1] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.ExtractText/index.html [2] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.ReplaceText/index.html If you found this help you with your question, please take a moment to login and click accept on this solution. Thanks, Matt

MattWho · ‎05-07-2021

@Justee You definitely can't use "/Home/Data/" in the "Remote File" property of the FetchSFTP processor. This must be a the full path to a specific file being fetched, which means it should be getting that path and filename values from the inbound FlowFile's attributes. Your ListSFTP (which you said is working) would be producing a FlowFile for each File found at target directory non your SFTP server. Those FlowFiles will get queued on the outbound connection containing the ListSFTP's "success" relationship. That connection is the inbound connection to the FetchSFTP processor that will connect to same SFTP server and retrieve the content for each of those FlowFiles that were produced by the ListSFTP processor. The "Remote File" property on the FetchSFTP processor should be using NiFi Expression Language (EL) to dynamically set the path and filename being fetched uniquely for each inbound FlowFile. ${path}/${filename} NiFi is case sensitive, so make sure you are using all lowercase so it matches the attribute names created on the Source FlowFiles by the ListSFTP processor. When you say the FetchSFTP processor is "not working", what does that mean? To which relationship are the inbound FlowFiles getting routed when it does not work? What exception (ERROR) is logged in the nifi-app.log when it fails to fetch the content for an inbound FlowFile? Make sure when you configured the username and password on the FetchSFTP you did not accidentally add a leading or trailing whitespace. Hope this helps, Matt

MattWho · ‎05-06-2021

@techNerd Details around your use case would be helpful here. The putSFTP processor is designed to write FlowFile content to a target SFTP server location. If you have no content to write, there is nothing for it to do. Thanks, Matt

MattWho · ‎05-06-2021

@sangee Details around your use case may be helpful. The SplitText processor will output all FlowFiles produced at the exact same time to the "splits" relationship. So if you intent is to wait until all splits from a single source FlowFile are produced before processing those splits, this flow is not needed. Pierre has written an excellent blog around using wait and notify processors in a dataflow that does merge and split. Check it out here: https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/ If you found this helpful with your query, please take a moment to login and click accept on this solution. Thanks, Matt

MattWho · ‎05-06-2021

@syntax_ NiFi does not provide a method for uploading templates in bulk. But anything you can do via the UI, you can also do via rest-api calls [1] through curl. So you could script the bulk upload through rest-api external to NiFi's UI. I would strongly discourage bulk uploading templates to NiFi. Templates should be uploaded as needed, instantiate to the canvas and then template deleted from NiFi. All uploaded templates even if not instantiated to the canvas become part of the flow.xml loaded in to heap memory. So keeping a large number of templates uploaded to your NiFi can have a considerable impact on JVM heap memory usage because now you have not only your active flow on the canvas in memory, but also these templates. Additionally, In a NiFi cluster on startup the flow.xml.gz is uncompressed, loaded in to heap memory, and a flow fingerprint created on every node. These flow fingerprints are then compared to make sure all nodes joining the cluster are running with the same flow.xml. Since these templates can just add unnecessary size to the flow.xml.gz, this can impact startup times and flow fingerprint comparison time. A better approach is to get your reusable flow migrated into NiFi-Registry [2]. You can have 1 to many NiFi's all connect to a single NiFi-Registry where all your flows exist. Then all you need to do is drag a process group to the canvas and select "import" which will allow you to select one of these flows from NiFi-Registry and add it to your canvas. Now you have your flow without the additional impact on heap from having it in NiFi both as a template and as a dataflow on your canvas with easy access to load same flow over and over. I encourage you to explore NiFi-Registry. [1] http://nifi.apache.org/docs/nifi-docs/rest-api/index.html [2] https://nifi.apache.org/registry.html If you found this response helped with your query, please take a moment to login in and click accept on this solution. Thanks, Matt

MattWho · ‎05-06-2021

@robnew The "CompressContent" [1] processor can be used to decompress gz files. My suggestion here since only some log files are compressed is to set up a flow that passes these FlowFiles through an "IdentifyMimeType" [2] processor. This will write out a new mime.type attribute on the FlowFiles. Then use a "RouteOnAttribute" [3] processor to route FlowFiles with mime.type = application/gzip (each new dynamic property becomes a new outbound relationship) to that "CompressContent" processor and the "unmatched" relationship (which will have every other non gz file) on through your flow without passing through "CompressContent" processor. [1] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.CompressContent/index.html [2] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.IdentifyMimeType/index.html [3] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.RouteOnAttribute/index.html If you found this helpful with your question, please take a moment to login and click accept on this solution. Thanks, Matt

Online	Offline
Last Visited	‎12-26-2025 02:55 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-26-2025 02:55 PM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Consume Kafka topics using wildcard

Re: NiFi 1.13.2 logs config not working

Re: How use ListSFTP to find particular file when...

Re: PrometheusReportingTask with same port and mul...

Re: Nifi Replace Text Error (Response) : Transferr...

Re: how to configure ListSFTP + FetchSFTP

Re: Zero Byte Error with PUTSFTP. Is there a way t...

Re: how to configure wait and notify processor in ...

Re: Upload multiple nifi templates at once

Re: Ranger logs are in gz format - is there any wa...