Member since
11-16-2015
911
Posts
668
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 699 | 09-30-2025 05:23 AM | |
| 1071 | 06-26-2025 01:21 PM | |
| 930 | 06-19-2025 02:48 PM | |
| 1100 | 05-30-2025 01:53 PM | |
| 12260 | 02-22-2024 12:38 PM |
05-29-2018
04:09 PM
1 Kudo
ListHDFS emits empty (0-byte) flow files that have attributes (such as filename and path, see the doc for details) set on them. In this case FetchHDFS is running way more slowly than ListHDFS (it takes longer to retrieve the file than to list that it's there), which is why you get the backup. Also setting Max Size as a backpressure trigger won't work here since they are 0-byte files. Try setting Max number of Objects for backpressure instead.
... View more
05-29-2018
03:43 PM
What version of the Hive driver are you using? I'm not sure there is a version of the Hive driver available that supports all the JDBC API calls made by PutDatabaseRecord, such as executeBatch(). Also since the Hive JDBC driver auto-commits after each operation, PutDatabaseRecord + Hive would not be any more performant than using PutHiveQL. In an upcoming version of NiFi/HDF (for Hive 3.0), you should be able to use PutHive3Streaming to do what you want.
... View more
05-27-2018
07:35 PM
1 Kudo
The variables are in a ComponentVariableRegistry which is pretty well-hidden under the NiFi API. Usually you get at variables by evaluating Expression Language in the context of a processor property. In this case I set a Process Group variable called "myAttr" to "myValue", then I configured ExecuteScript like so: Note that I created a user-defined property "myProp" whose value is an expression language construct containing the PG variable name. When you call evaluateAttributeExpressions() on that property (see the script) it will resolve the value of "myAttr" and return that, you can verify that an outgoing flow file would now have "myFlowFileAttr" set to "myValue".
... View more
05-27-2018
03:36 AM
1 Kudo
What does your CREATE TABLE statement look like, and does it match the schema of the file(s) (Avro/ORC) you're sending to the external location?
... View more
05-25-2018
11:42 PM
1 Kudo
Your PutHDFS processor is placing the data into Hadoop (in ORC format after ConvertAvroToORC) for use by Hive, so you don't also need to send an INSERT statement to PutHiveQL. Rather with the pattern you're using, you should have ReplaceText setting the content to a Hive DDL statement to create a table on top of the ORC file(s) location, or a LOAD DATA INPATH to load from the HDFS location into an existing Hive table.
... View more
05-23-2018
03:49 PM
I left a response on my other answer but will leave it here too in case you hadn't seen it: Looking at the parquet-avro code, I think your suggestion of the workaround to change decimal values to fixed is the right approach (for now). We could update the version of parquet-avro but I didn't see anything in there that would improve your situation, it was Impala that needed to support more incoming types.
... View more
05-22-2018
12:05 PM
1 Kudo
I added an answer to that question, but it is likely unsatisfying as it is an open issue. The Hive driver used in the Hive processors is based on Apache Hive 1.2.x which does not support a handful of JDBC API methods used by those processors.
... View more
05-21-2018
02:16 PM
For approach #1, you could use the FlattenJson processor, you'll likely want to set the Separator property to "_" rather than the default "." since Hive adds the table name to each column in a ResultSet. For approach #2, you could have a single column table (column of type String), then you'd query it with get_json_object (example here). Alternatively if you can map all the types (including the complex types like array, list, struct, etc.) to a Hive table definition, you could use a JSON SerDe to write the data (example here).
... View more
05-18-2018
01:31 AM
What version of NiFi are you using, and what does the CREATE TABLE statement for the source and target tables look like? Is Oracle your target DB or is it a different DB? I ran with ExecuteSQL against Oracle 11 (with NiFi's master branch, so 1.6.0+ or "almost 1.7.0"), populated with your actual data (using the same PutDatabaseRecord but with a JsonTreeReader). It generated the same Avro schema you have above with the same data, I changed PutDatabaseRecord to use an AvroReader with Use Embedded Schema, and everything ran fine, inserting the rows successfully. I'm guessing you have an older version of NiFi that might be missing some fixes and/or improvements around logical type (timestamp, e.g.) handling.
... View more
05-17-2018
01:32 PM
In NiFi 1.7.0 (via NIFI-4456) you will be able to select that your JsonRecordSetWriter issue one JSON object per line, to give you the output you want. In this case with JoltTransformJSON you'd need a ConvertRecord after the fact, since it will issue an array if you pass in an array. As a workaround you might be able to use ReplaceText to remove the first and last square brackets, and change },{ to },\n{ to put each value on the same line.
... View more