Member since
11-16-2015
905
Posts
665
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 416 | 09-30-2025 05:23 AM | |
| 735 | 06-26-2025 01:21 PM | |
| 631 | 06-19-2025 02:48 PM | |
| 837 | 05-30-2025 01:53 PM | |
| 11331 | 02-22-2024 12:38 PM |
10-25-2017
08:30 PM
On the Scheduling tab, set Run Schedule to something like 30 seconds, then you can start and stop the processor immediately, it will run only once. I think GetFTP might be the better of the two options, unless there's some reason it doesn't work with your system.
... View more
10-25-2017
07:36 PM
Are you saying your FTP server does not support listing the top level directory? It shouldn't matter if you don't have a directory structure, as long as the FTP server itself can respond to the list commands (MLSD and/or NLST I think). Alternatively, iIf you know the filename(s) you want to fetch, you can do one of two things: 1) Use GetFTP rather than ListFTP -> FetchFTP, setting the File Filter Regex to include the files you want (perhaps .* for all) 2) Use GenerateFlowFile in place of ListFTP, setting the "filename" attribute to the file you want to fetch. This runs at the rate scheduled for GenerateFlowFile, and will generate the same filenames over and over, unless you are using Expression Language to set the filename at each execution. Basically FetchFTP needs an incoming connection to provide the "filename" attribute so it knows which file to fetch. GetFTP is kind of a combination of ListFile->FetchFile.
... View more
10-25-2017
07:10 PM
If you are waiting for X number of flow files to be received, you can use something like this (assuming you want 10 flow files): def flowfileList = session.get(10)
if(flowfileList.size() < 10) {
session.rollback()
return
}
// If you get here, you have 10 flowfiles in flowfileList
... View more
10-23-2017
05:06 PM
1 Kudo
You can use the Run Schedule property on the Scheduling tab of the processor to set the interval at which it will be scheduled to run, so for 10k events per second you can set it to "100 nanos".
... View more
10-23-2017
03:53 PM
Edited due to Matt Clarke's comment below: Support for Expression Language in SelectHiveQL properties was added to NiFi 1.3.0 under NIFI-3867. It is also available in HDF 3.0.x. One (less-than-ideal) workaround is to set the property to something invalid but identifiable (like "@HIVE_URL@") and have a script to replace that value in the template before uploading to a particular environment.
... View more
10-20-2017
02:55 PM
1 Kudo
In Apache NiFi 1.5.0 (not yet released at the time of this writing), SelectHiveQL (via NIFI-4473) will have a property to Normalize Names for Avro, so you won't have to do the alias.
... View more
10-19-2017
06:43 PM
I have a blog post that describes how to use ExecuteScript with Groovy and Sshoogr to execute remote commands via SSH. Not sure if this is what you're looking for but I thought I'd share in case it was useful.
... View more
10-18-2017
01:02 PM
IIRC, Groovy 3 is supposed to support the Java lambda syntax, but NiFi uses Groovy 2 which does not support it. However Groovy has always had Closures, which are very similar and used for the same purpose, so despite a small difference in syntax, you should be able to use Groovy closures the same as you would use Java lambdas. If you are referring to the Java Streams API (things like foreach() that take a lambda), Groovy has iterative and aggregate functions for that too, such as each() and eachWithIndex(), spread-dot and collect(), etc.
... View more
10-17-2017
06:32 PM
In order to distribute the fetch across the cluster, you will need ListSFTP -> Remote Process Group, the RPG should send to an Input Port on the same cluster, and that Input Port can be connected to the FetchSFTP. This RPG->Input Port connection will distribute the flow files containing the file names across the cluster, and each node's Input Port will receive a subset of those flow files, which will then be fetched in parallel across the cluster.
... View more
10-12-2017
06:40 PM
1 Kudo
Are you expecting a result set from the SELECT statement? If so then you might be better off with building a SQL query with the CLOB value inline (probably quoted and/or padded with rpad I imagine) and using ExecuteSQL vs PutSQL. The former is for queries that return result sets, the latter is for statements that don't (like INSERT).
... View more