Member since
11-16-2015
911
Posts
668
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 704 | 09-30-2025 05:23 AM | |
| 1076 | 06-26-2025 01:21 PM | |
| 931 | 06-19-2025 02:48 PM | |
| 1103 | 05-30-2025 01:53 PM | |
| 12286 | 02-22-2024 12:38 PM |
11-10-2017
03:23 PM
That's the thing, I'm not sure any NAR version will work. Are you using Apache Hive 2.3.0 or a vendor-specific version?
... View more
11-09-2017
03:06 PM
1 Kudo
The Hive processors in Apache NiFi 1.4.0 are built against Apache NiFi 1.2.1, so are not guaranteed to work with Apache Hive 2.3.0. If you are using the HDP platform, it has a Hive version based on 1.2.x but is closer to 2.0. Apache NiFi's Hive processors are not compatible with HDP 2.5+, you will likely want to use the NiFi-only Hortonworks Data Flow (HDF) package. This version of NiFi is built against the HDP Hive 1.2.x version. Having said that, HDF NiFi might work for your case, but it is also not guaranteed / supported to work against Hive 2.3.0 (whether Apache Hive or HDP Hive), as the baseline is still 1.2.x. The currently supported configuration is HDF NiFi against HDP Hive 1.2.x.
... View more
11-09-2017
02:18 PM
1 Kudo
If you can't set that property on the JDBC URL, then as of Apache NiFi 1.2.0 (HDF NiFi 3.0.x), due to NIFI-3426, you can add user-defined properties that will be passed to the connection. A list of these properties is available here, and includes the one you mention.
... View more
11-09-2017
01:59 PM
It looks like the Solr service wants an array of JSON objects but you have a single JSON object, and also it expects the sharecount value to be nested under sharecount in a "set" field. If you know the attributes you need and there aren't very many of them (for example, there are only two you mention), you can use ReplaceText instead of AttributesToJSON, using Expression Language to hand-create the JSON array. The replacement text might look like this: [{"url": "${url}", "sharecount": {"set": ${sharecount}}}]
... View more
11-07-2017
06:36 PM
1 Kudo
These comments are spot-on, thanks! Also I'd mention if you want to dynamically customize it with incoming flow files, an alternative is to send your flow into GenerateTableFetch (on the primary node only, so your most upstream processor(s) will need to run on the primary node only). GenerateTableFetch (GTF) is like QueryDatabaseTable (QDT), with the big differences being 1) GTF takes incoming flow files, and 2) QDT executes the SQL it generates internally, where GTF sends the SQL out as flow files so some other processor (ExecuteSQL, e.g.) can execute it. This can be used by sending the SQL output from GTF to a Remote Process Group (RPG) pointed at an Input Port on your same cluster. This RPG -> Input Port pattern is used to distribute the flow files among the nodes in the cluster, rather than every node working on the same data (which leads to data duplication as @Abdelkrim Hadjidj mentions above. Downstream from the Input Port, all nodes are processing their subset of the flow files in parallel, so you can send Input Port -> ExecuteSQL. This flow is basically a parallel, distributed version of what QueryDatabaseTable does on one node.
... View more
11-07-2017
06:02 PM
1 Kudo
Although you may not see the same performance gains from ISP using Jython as you would by using Groovy (Jython is slower in general), this is still a good idea, so I revisited my blog post and created an ISP template in Jython. Please let me know if it works for you!
... View more
11-06-2017
04:39 PM
1 Kudo
You can extract values from JSON content into attributes using EvaluateJsonPath, then you can use RouteOnAttribute to do routing.
... View more
11-02-2017
01:57 PM
1 Kudo
ExecuteSQL will fetch and send all rows in the ResultSet each time it runs. If you don't want that many rows in a single flow file (but still want it to only execute once), use QueryDatabaseTable with no Max Value Columns set. This acts like ExecuteSQL if no Max-Value Columns are supplied (so you will still want to start/stop it), but also has options like Max Rows Per Flow File, etc.
... View more
11-01-2017
11:27 PM
1 Kudo
You can start and immediately stop the processor, it will be guaranteed to run at least once (set the Run Schedule to something like 10 seconds so it gives you enough time to start and stop it).
... View more
11-01-2017
01:13 PM
This feature was added in NiFi 1.4.0 (NIFI-4257)
... View more