Member since
07-30-2019
3397
Posts
1619
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 459 | 11-05-2025 11:01 AM | |
| 360 | 11-05-2025 08:01 AM | |
| 520 | 11-04-2025 10:16 AM | |
| 708 | 10-20-2025 06:29 AM | |
| 848 | 10-10-2025 08:03 AM |
01-04-2017
03:55 PM
@Aman Jain The ListSFTP processor has a "File Filter" property that allow you to use a java regular expression to specify the filename pattern you want the processor to look for on the target SFTP server. It does not give you the capability to pull some value from MySQL to use here which is what it sounds like you want to do. That being said.... keep in mind that the ListSFTP processor does actually fetch and data, it only produces a zero byte NiFi FlowFile for each File it lists. It is the responsibility of the FetchSFTP processor to actually retrieve the data content and add it to the NiFi FlowFile.
Perhaps you can have NiFi always list all the files for the target SFTP server and filter out hose 0 byte FlowFiles you do not want before doing the FetchSFTP on each run? Having one flow that retrieves your load date from mysql and writes it to a distributed Cache Service in NiFi. then use Then have another flow to list files and filter them based on the current value loaded in the Cache service. The filtered FlowFiles could then be sent to FetchSFTP processor while other are just dropped/auto-terminated. Matt
... View more
01-04-2017
01:41 PM
3 Kudos
@bala krishnan 1. "Concurrent tasks" is nothing new to NiFi. There currently is no capability to set concurrency at the process group level and I am not sure that would be a good idea. I would assume you are looking for a way to set a number of "concurrent tasks" that would then get applied to every processor within a process group? Some processors involve tasks that are more cpu intensive then others. For example: CompressContent processor is cpu intensive. For every concurrent task it i assigned, 100% of cpu core is consumed for each file it compresses/decompresses. adding to many "concurrent tasks" here could have serious impact on the system hosting NiFi. UpdateAttribute processor on the other hand typically has very little CPU impact. One concurrent task here can process batches of FlowFiles very rapidly, so multiple concurrent tasks is usually unnecessary and a waste of server resources. 2. There is no defined algorithm for how many concurrent tasks a processor should receive out of the gate. Concurrent Tasks assignment is done through testing and fine tuning a dataflow using production data samples at production volumes. Evaluating your dataflow for bottlenecks in combination with tracking systems resource loads (CPU, Memory, network and disk I/O) can help tune concurrent task settings appropriately . Its is two often the case where users start off with assigning a high number of concurrent task rather then starting at the bottom. You have to remember that your system has only so much CPU to share. Assigning to many concurrent tasks to a single processor will hinder other processors who are looking for cpu time. Along with setting "concurrent tasks" on individual processors, there are global maximum timer and event driven thread settings in NiFi (Defaults are 10 and 5 respectively). These control the maximum number of threads NiFi will request from the server that will be used to fulfill the concurrent task request from the NiFi processor components. These global values can be adjusted in "controller settings" (Located via the hamburger menu in the upper right corner of the NiFi UI.) Typical setting here are double to quadruple the number of CPU cores you have on your server. Giving excessive values here doe snot improve performance as those threads just spend more time in CPU wait. Thanks, Matt
... View more
01-03-2017
02:46 PM
@Aman Jain
If you found this information helpful,please accept the answer.
... View more
01-02-2017
01:04 PM
@amanjain The FlowFile would be rooted to the failure relationship in both those case. Those FlowFiles would be penalised based on the penalty duration configured on the fetchSFTP processor (default of 30 secs). That FlowFile will not be processed by the processor it is connected to until that penalty has expired. The common scenario her is to have the failure relationship loop back on the fetchSFTP processor so after penalty has expired another attempt will be made to retrieve the data. Matt
... View more
12-21-2016
07:00 PM
@Sunile Manjee Also keep in mind that NiFi Content archiving is enabled by default with a retention period of 12 hours or 50% disk utilization before the archived content is removed/purged. The purging of FlowFile manually within your dataflow will not trigger the deltion of archived FlowFiles.
... View more
12-20-2016
04:47 PM
1 Kudo
@Ahmad Debbas FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them:
The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3". Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow. Thanks, Matt
... View more
12-20-2016
03:36 PM
@D'Andre McDonald The Get based processors will create a "absolute.path" FlowFile attribute on all Files that are ingested in to NiFi. So you would configure your Get processor to point at the base directory and consume files from all subdirectories. The Put based processors support expression language in the "remote path" property. So you can use any attribute on the FlowFile to specify what path the file will be written to on the put. So here you could use ${absolute.path} as the value for this property. The Put based processors also have a property for "create directory" which you can set to true. Thank you, Matt
... View more
12-15-2016
01:22 PM
2 Kudos
@NAVEEN KUMAR
One suggestion might be use a ListFile processor configured to run on cron schedule. You could then feed the success from that processor to MonitorActivity processor. The inactive relationship of this processor could be routed to a putEmail processor. So lets say you have you list file configured to run every 3 minutes based on a cron. You could set your threshold in the MonitorActivity processor to 3 minutes with a setting of "continually send message" set to true. With the inactive relationship routed to putEmail, you will get an email every 3 minutes if the listFile produced no new files. you could also route the activity.restored relationship to a PutEmail processor if you want to be notified if file where seen following a period of no activity. Thanks, Matt
... View more
12-14-2016
10:51 PM
1 Kudo
@Sunile Manjee FlowFile Content is stored in claims inside the content repo. Each claim can contain the content from 1 or more FlowFiles. A claim will not be moved to content Archive or purged from the content repository until all active FlowFiles in your dataflow that have references to any of the content in that claim have been removed. Those FlowFiles can be removed via manual purging of the queues (Empty Queue), Flow file expiration on a connection or via auto-termination at the end of a dataflow.
The FlowFile count and size reported in the UI does not reflect the size of the claims the content repo. Those stats report the size and number of active FlowFiles queued in your flow. It is very likely and usual to see the size reported in the UI to differ from actual disk usage.
Thanks, Matt
... View more
12-12-2016
01:30 PM
2 Kudos
@Piyush Routray Not sure I am clear with what you mean by "I intend to have a separate NiFi cluster than the HDF cluster". Are you installing just NiFi via command line? - You can install NiFi using command line and utilize the embedded zk. http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/index.html When you get to the download HDF section of the "Command Line Installation" documentation, go to the bottom of the list to download just the NiFi tar.gz file. The relevant docs for this are found here: http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/bk_administration/content/clustering.html http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/bk_administration/content/state_providers.html Are you trying to install NIFi via HDF's Ambari? - The Ambari based installation of HDF will install an external ZK for you and setup NiFi to use it for you. Thanks, Matt
... View more