About mburgess

mburgess · ‎06-20-2018

This is a regression (NIFI-4862) introduced in 1.5.0 and will be fixed in 1.7.0 (HDF 3.2).

mburgess · ‎06-20-2018

As of NiFi 1.6.0, there is a processor called RunMongoAggregation that should do what you want. I don't think you'll need the "db.Pdata.aggregate" part, just the JSON query body, possibly as an array (i.e. the Mongo shell may know to treat that list of objects as an array but the NiFi processor may expect a valid JSON array of objects).

mburgess · ‎06-20-2018

You can use UpdateRecord to add a field to your records, then PutDatabaseRecord to put it into your database. Using the record-aware processors allows you better control over the content (rather than ReplaceText which can be brittle and only supports things like CSV and JSON, not Avro), and you don't need to use the Split/Merge pattern; instead these processors operate on many records in a single flow file, making things more efficient. If the table has already been created and you are trying to insert new rows with an additional column, then (as of NiFi 1.5.0) you might be able to use PutSQL before PutDatabaseRecord to execute a statement like ALTER TABLE ADD COLUMN IF NOT EXISTS or something like that, but it is probably better to issue that statement once on your table externally so you don't need to do that for each flow file coming through NiFi.

mburgess · ‎06-19-2018

You will want to set whatever column has your "LAST MODIFIED" values as the Maximum Value Column in GenerateTableFetch. The first time it will still generate SQL to pull the complete data, but it will also keep track of the maximum observed value from your Maximum Value Column. The next time GenerateTableFetch runs, it will only generate SQL to fetch the rows whose value for LAST MODIFIED is greater than the last observed maximum. If you want the first generation to start at a particular value (for the Maximum Value Column), you can add a user-defined property called "initial.maxvalue.<maxvaluecol>", where "<maxvaluecol>" is the name of the column you specified as the Maximum Value Column. This allows you to "skip ahead", and from then on GenerateTableFetch will continue in normal operation, keeping track of the current maximum and only generating SQL to fetch rows whose values are larger than the current max. If you need a custom query (or, more correctly, you want to add a custom WHERE clause), you can do that by setting the Custom WHERE Clause property of GenerateTableFetch. If you need completely arbitrary queries, then in NiFi 1.7.0 (via NIFI-1706) you can use QueryDatabaseTable to provide arbitrary queries. This capability does not exist for GenerateTableFetch, but we can investigate adding it as an improvement, please feel free to file a Jira for this.

mburgess · ‎06-19-2018

Yes I think that will work. Also if you convert from decimal to a different type then you should be able to use PutHiveStreaming, although that isn't always as performant as it could be. In the upcoming Hive 3 bundle, there is a new Streaming API and PutHive3Streaming should be much faster (and Avro logical types are supported).

mburgess · ‎06-19-2018

I recommend using MergeRecord before the JoltTransformJSON as the Jolt transform should be able to be applied to the whole JSON array (after your smaller JSON objects have been merged). You'll want to use a JsonTreeRecordReader and provide an Avro schema that matches your input data above. mergerecord-example.xml is an example template where I generate data similar to yours, use MergeRecord to bundle them 20 at a time, then run the Jolt spec on it, it includes the associated Avro schema and hopefully all config to get you up and going.

mburgess · ‎06-19-2018

ConvertAvroToORC is in the Hive bundle which uses Avro 1.7.7, which does not support logical types such as decimal. This is discussed in NIFI-5079, where it was decided to add support via a PutORC processor in the upcoming Hive 3 bundle (slated for NiFi 1.7.0 and HDF 3.2). If you are using a version of NiFi prior to 1.6.0, then upgrading may help solve the original conversion issue when fetching from SQLServer (via NIFI-4846). However if you're converting to ORC you will still run into the issue above. A workaround might be to store the Avro directly to HDFS and put the Hive table atop the Avro data vs ORC data.

mburgess · ‎06-18-2018

IIRC Jython returns the last thing evaluated, so you shouldn't need a "return" statement? Also the ExecuteScript processor does not (currently) use any return value from a script so you should just be able to let the script finish, and ensure any "return" statements are in function declarations, not the top-level script.

mburgess · ‎06-13-2018

If your custom code can send flow files with attributes containing the source and destination information, you can use FetchS3Object to get the file from S3, then PutFile to put it in a local file share. If your custom code does not use the NiFi API, then consider ExecuteScript with Groovy (specifying your JARs in the Module Directory property) and calling the code from there, or perhaps even ExecuteStreamCommand if you want to (or must) call it from the command line. For the former option, I discuss how to use modules in code in part 3 of my ExecuteScript Cookbook series (and the other parts have related examples).

mburgess · ‎06-13-2018

Do you mean you tried ListSFTP to get the listings?

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: NiFi SelectHiveQL - flowfile attributes not co...

Re: how to use mongodb aggregate querys in get mon...

Re: Adding columns to sql in nifi

Re: Fetch LAST MODIFIED data using NiFi

Re: nifi convertavrotoorc decimal datatype issue

Re: Help Transforming JSON

Re: nifi convertavrotoorc decimal datatype issue

Re: ExecuteScript Cookbook (part 1)

Re: Nifi for reading json data and extract source ...

Re: GETSFTP with NiFi cluster