About mburgess

mburgess · ‎05-29-2018

ListHDFS emits empty (0-byte) flow files that have attributes (such as filename and path, see the doc for details) set on them. In this case FetchHDFS is running way more slowly than ListHDFS (it takes longer to retrieve the file than to list that it's there), which is why you get the backup. Also setting Max Size as a backpressure trigger won't work here since they are 0-byte files. Try setting Max number of Objects for backpressure instead.

mburgess · ‎05-29-2018

What version of the Hive driver are you using? I'm not sure there is a version of the Hive driver available that supports all the JDBC API calls made by PutDatabaseRecord, such as executeBatch(). Also since the Hive JDBC driver auto-commits after each operation, PutDatabaseRecord + Hive would not be any more performant than using PutHiveQL. In an upcoming version of NiFi/HDF (for Hive 3.0), you should be able to use PutHive3Streaming to do what you want.

mburgess · ‎05-27-2018

The variables are in a ComponentVariableRegistry which is pretty well-hidden under the NiFi API. Usually you get at variables by evaluating Expression Language in the context of a processor property. In this case I set a Process Group variable called "myAttr" to "myValue", then I configured ExecuteScript like so: Note that I created a user-defined property "myProp" whose value is an expression language construct containing the PG variable name. When you call evaluateAttributeExpressions() on that property (see the script) it will resolve the value of "myAttr" and return that, you can verify that an outgoing flow file would now have "myFlowFileAttr" set to "myValue".

mburgess · ‎05-27-2018

What does your CREATE TABLE statement look like, and does it match the schema of the file(s) (Avro/ORC) you're sending to the external location?

mburgess · ‎05-25-2018

Your PutHDFS processor is placing the data into Hadoop (in ORC format after ConvertAvroToORC) for use by Hive, so you don't also need to send an INSERT statement to PutHiveQL. Rather with the pattern you're using, you should have ReplaceText setting the content to a Hive DDL statement to create a table on top of the ORC file(s) location, or a LOAD DATA INPATH to load from the HDFS location into an existing Hive table.

mburgess · ‎05-23-2018

I left a response on my other answer but will leave it here too in case you hadn't seen it: Looking at the parquet-avro code, I think your suggestion of the workaround to change decimal values to fixed is the right approach (for now). We could update the version of parquet-avro but I didn't see anything in there that would improve your situation, it was Impala that needed to support more incoming types.

mburgess · ‎05-22-2018

I added an answer to that question, but it is likely unsatisfying as it is an open issue. The Hive driver used in the Hive processors is based on Apache Hive 1.2.x which does not support a handful of JDBC API methods used by those processors.

mburgess · ‎05-21-2018

For approach #1, you could use the FlattenJson processor, you'll likely want to set the Separator property to "_" rather than the default "." since Hive adds the table name to each column in a ResultSet. For approach #2, you could have a single column table (column of type String), then you'd query it with get_json_object (example here). Alternatively if you can map all the types (including the complex types like array, list, struct, etc.) to a Hive table definition, you could use a JSON SerDe to write the data (example here).

mburgess · ‎05-18-2018

What version of NiFi are you using, and what does the CREATE TABLE statement for the source and target tables look like? Is Oracle your target DB or is it a different DB? I ran with ExecuteSQL against Oracle 11 (with NiFi's master branch, so 1.6.0+ or "almost 1.7.0"), populated with your actual data (using the same PutDatabaseRecord but with a JsonTreeReader). It generated the same Avro schema you have above with the same data, I changed PutDatabaseRecord to use an AvroReader with Use Embedded Schema, and everything ran fine, inserting the rows successfully. I'm guessing you have an older version of NiFi that might be missing some fixes and/or improvements around logical type (timestamp, e.g.) handling.

mburgess · ‎05-17-2018

In NiFi 1.7.0 (via NIFI-4456) you will be able to select that your JsonRecordSetWriter issue one JSON object per line, to give you the output you want. In this case with JoltTransformJSON you'd need a ConvertRecord after the fact, since it will issue an array if you pass in an array. As a workaround you might be able to use ReplaceText to remove the first and last square brackets, and change },{ to },\n{ to put each value on the same line.

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: NiFi 1.4 queue shows millions of files and 0 M...

Re: Error: PutDatabaseRecord + Hive connection poo...

Re: NiFi: Is it possible to access Processor Group...

Re: Oracle to Hive table

Re: Oracle to Hive table

Re: Avro logical types data format from QueryDatab...

Re: From CSV to Hive via NiFi

Re: create hive table from nested json file in NIF...

Re: PutDatabaseRecord and mix of null and not null...

Re: Csv to json conversion error