About mburgess

mburgess · ‎05-23-2023

As far as I know, InferAvroSchema is not a supported processor. However there are record-based processors and an AvroReader controller service you can use with those processors. AvroReader has a (default) option to Infer Schema. This should achieve the same effect, and record-based processors are often more performant.

mburgess · ‎03-16-2023

In Oracle an UPSERT is done by a MERGE, so alternatively you could store your data in a new temporary table and then run ExecuteSQL/PutSQL with a MERGE command to merge from the temp table into the target table.

mburgess · ‎02-02-2023

I'm not a Hive expert but I did author the original PutHive3Streaming processor for NiFi. My recommendation is setting Records Per Transaction greater than the number of records in a FlowFile (unless we are talking about super-huge files), and transactions per batch to 1. This makes the transaction semantics similar to how NiFi FlowFile sessions work (rollback, failure, success, e.g.). If the number of records is huge and is causing throughput problems, try dividing that number by 100 and making transactions per batch 100. When you multiply the two numbers together it should be greater than the total number of records in the FlowFile in order to avoid overhead with the Hive Metastore by requesting a large number of batches/transactions.

mburgess · ‎12-20-2022

I wasn't able to reproduce this, I remember trying your example and the UPSERT worked for me, so I'm not sure what's going on

mburgess · ‎11-02-2022

Agreed, you do not have access to the fields in either the incoming or outgoing JSON objects using Expression Language in the spec.

mburgess · ‎10-05-2022

I believe the type checking for logical types is more strict now as of https://issues.apache.org/jira/browse/AVRO-2493 and NiFi 1.17.0 (when we upgraded to Avro 1.11.1). Are you using "int" or "string" as the normal Avro type? According the spec (https://avro.apache.org/docs/1.11.1/specification/#timestamp-millisecond-precision) it must be "long".

mburgess · ‎12-07-2021

The operation to add an attribute to a FlowFile is on the ProcessSession object not the FlowFile itself (so the session can keep track of changes). Try the following instead: session.putAttribute(destFlowFile, , "logMsg", "Testing Msg") session.putAllAttributes(destFlowFile, backupAttributes)

mburgess · ‎03-24-2021

What are the column names in your table? Assuming "carId" and "carType", you can use JoltTransformJson or JoltTransformRecord with the following spec: [ { "operation": "shift", "spec": { "*": { "$": "carId", "@": "carType" } } }, { "operation": "shift", "spec": { "carId": { "*": { "@": "[&0].carId" } }, "carType": { "*": { "@": "[&0].carType" } } } } ]

mburgess · ‎02-09-2021

If you use GrokReader you can use the same kv filter from logstash: https://community.cloudera.com/t5/Support-Questions/Grok-Patterns-Expressions-for-capturing-comma-separated-key/td-p/311126

mburgess · ‎01-29-2021

Is there anything in the logs before/after the "already marked for transfer" entry? Trying to figure out how a flow file can get transferred and then something goes wrong (where we'd try to also send it to failure)

Online	Offline
Last Visited	‎01-16-2025 06:21 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2025 06:21 PM
Posts	892
Kudos received	643

Cloudera Community

Re: Nifi Building error when creating a brand new ...

Re: Tuning PutHive3Streaming NiFi processor

Re: NiFi ExecuteScript - Able to add attributes to...

Re: NiFi - JOLT assign value to attribute from Jso...

Re: NiFi - ExecuteScript for getting max value of ...

Re: Exception in CDP Processor InferAvroSchema

Re: How to perform UPSERT in Oracle DB using Apach...

Re: Tuning PutHive3Streaming NiFi processor

Re: Upsert on conflict not quoting keys

Re: Date Transformation using JoltTransform

Re: Nifi logs flooded with avro LogicalTypes warni...

Re: NiFi ExecuteScript - Able to add attributes to...

Re: Converting Keys and values of a flat json to t...

Re: Nifi: KV filter

Re: "is already marked for transfer" in PutDatabas...