Member since
11-16-2015
911
Posts
668
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 697 | 09-30-2025 05:23 AM | |
| 1070 | 06-26-2025 01:21 PM | |
| 929 | 06-19-2025 02:48 PM | |
| 1100 | 05-30-2025 01:53 PM | |
| 12259 | 02-22-2024 12:38 PM |
06-20-2018
10:20 PM
2 Kudos
This is a regression (NIFI-4862) introduced in 1.5.0 and will be fixed in 1.7.0 (HDF 3.2).
... View more
06-20-2018
03:41 PM
As of NiFi 1.6.0, there is a processor called RunMongoAggregation that should do what you want. I don't think you'll need the "db.Pdata.aggregate" part, just the JSON query body, possibly as an array (i.e. the Mongo shell may know to treat that list of objects as an array but the NiFi processor may expect a valid JSON array of objects).
... View more
06-20-2018
02:53 PM
2 Kudos
You can use UpdateRecord to add a field to your records, then PutDatabaseRecord to put it into your database. Using the record-aware processors allows you better control over the content (rather than ReplaceText which can be brittle and only supports things like CSV and JSON, not Avro), and you don't need to use the Split/Merge pattern; instead these processors operate on many records in a single flow file, making things more efficient. If the table has already been created and you are trying to insert new rows with an additional column, then (as of NiFi 1.5.0) you might be able to use PutSQL before PutDatabaseRecord to execute a statement like ALTER TABLE ADD COLUMN IF NOT EXISTS or something like that, but it is probably better to issue that statement once on your table externally so you don't need to do that for each flow file coming through NiFi.
... View more
06-19-2018
10:33 PM
You will want to set whatever column has your "LAST MODIFIED" values as the Maximum Value Column in GenerateTableFetch. The first time it will still generate SQL to pull the complete data, but it will also keep track of the maximum observed value from your Maximum Value Column. The next time GenerateTableFetch runs, it will only generate SQL to fetch the rows whose value for LAST MODIFIED is greater than the last observed maximum. If you want the first generation to start at a particular value (for the Maximum Value Column), you can add a user-defined property called "initial.maxvalue.<maxvaluecol>", where "<maxvaluecol>" is the name of the column you specified as the Maximum Value Column. This allows you to "skip ahead", and from then on GenerateTableFetch will continue in normal operation, keeping track of the current maximum and only generating SQL to fetch rows whose values are larger than the current max. If you need a custom query (or, more correctly, you want to add a custom WHERE clause), you can do that by setting the Custom WHERE Clause property of GenerateTableFetch. If you need completely arbitrary queries, then in NiFi 1.7.0 (via NIFI-1706) you can use QueryDatabaseTable to provide arbitrary queries. This capability does not exist for GenerateTableFetch, but we can investigate adding it as an improvement, please feel free to file a Jira for this.
... View more
06-19-2018
06:17 PM
Yes I think that will work. Also if you convert from decimal to a different type then you should be able to use PutHiveStreaming, although that isn't always as performant as it could be. In the upcoming Hive 3 bundle, there is a new Streaming API and PutHive3Streaming should be much faster (and Avro logical types are supported).
... View more
06-19-2018
05:36 PM
I recommend using MergeRecord before the JoltTransformJSON as the Jolt transform should be able to be applied to the whole JSON array (after your smaller JSON objects have been merged). You'll want to use a JsonTreeRecordReader and provide an Avro schema that matches your input data above. mergerecord-example.xml is an example template where I generate data similar to yours, use MergeRecord to bundle them 20 at a time, then run the Jolt spec on it, it includes the associated Avro schema and hopefully all config to get you up and going.
... View more
06-19-2018
04:47 PM
1 Kudo
ConvertAvroToORC is in the Hive bundle which uses Avro 1.7.7, which does not support logical types such as decimal. This is discussed in NIFI-5079, where it was decided to add support via a PutORC processor in the upcoming Hive 3 bundle (slated for NiFi 1.7.0 and HDF 3.2). If you are using a version of NiFi prior to 1.6.0, then upgrading may help solve the original conversion issue when fetching from SQLServer (via NIFI-4846). However if you're converting to ORC you will still run into the issue above. A workaround might be to store the Avro directly to HDFS and put the Hive table atop the Avro data vs ORC data.
... View more
06-18-2018
06:18 PM
IIRC Jython returns the last thing evaluated, so you shouldn't need a "return" statement? Also the ExecuteScript processor does not (currently) use any return value from a script so you should just be able to let the script finish, and ensure any "return" statements are in function declarations, not the top-level script.
... View more
06-13-2018
01:56 AM
If your custom code can send flow files with attributes containing the source and destination information, you can use FetchS3Object to get the file from S3, then PutFile to put it in a local file share. If your custom code does not use the NiFi API, then consider ExecuteScript with Groovy (specifying your JARs in the Module Directory property) and calling the code from there, or perhaps even ExecuteStreamCommand if you want to (or must) call it from the command line. For the former option, I discuss how to use modules in code in part 3 of my ExecuteScript Cookbook series (and the other parts have related examples).
... View more