Support Questions

cotopaul · ‎07-07-2023

Hi guys,

Please help me out with a strange behavior when using PutBigQuery.

I am using Apache NiFi 1.19.1

So, my flow is as follows:

Step a:

I have a GenerateTableFetch and an ExecuteSQLRecord, which extracts some data out of Database.

Step b:

The data gets loaded into a GCS Bucket, using PutGCSOBject.

Step c:

When the data has been saved into the GCS Bucket, I have an UpdateAttribute Processor, linked to the success queue.

Within this UpdateAttribute Processor, I have defined the following 3 attributes:

TABLE_NAME = ${generatetablefetch.tableName:toUpper()}

EXECUTION_DATE = ${now():toNumber()}

MESSAGE = 1

Step d:

The success queue is linked afterwards to an AttributesToJSON Processor.

I have modified the properties as follows:

Destination = flowfile-content

Attributes List = TABLE_NAME, EXECUTION_DATE, MESSAGE

Step e:

Via success, I link to an ConvertRecord, where I change from JSON to AVRO.

The JSON Reader and the AVRO Writter are both defined with the following schema:

{

"namespace": "example.avro",

"type": "record",

"name": "DOMAIN.LOGGING_STATUS_EXECUTION",

"fields": [

{

"name": "TABLE_NAME",

"type": "string"

},

{

"name": "EXECUTION_DATE",

"type": [

"null",

{

"type": "long",

"logicalType": "local-timestamp-millis"

}

]

},

{

"name": "MESSAGE",

"type": "int"

}

]

}

Step f:

First test would be with PutBigQueryBatch.

I have defined my Dataset, my Table Name, Load File Type = AVRO, Create Disposition = CREATE_IF_NEEDED and Write Disposition = WRITE_APPEND.

When executing the processor on the AVRO File (from step e), the data gets loaded correctly into my BigQuery Table.

My second test would be with PutBigQuery.

I have defined my Dataset, my Table Name, the Record Reader as an AVRO Reader using the embedded AVRO Schema and Transfer Type = BATCH.

When executing the processor on the AVRO File (from step e), the data gets loaded into my BigQuery Table, but all the values are NULL ... and no matter how much I wait, it remains NULL.

Here is a screenshot of how the data looks, in the same table, where row 1 = PutBigQuery and row 2 = PutBigQueryBatch, using the same flow on the same data.

The table has the following column-data types and it has not partitioning.
TABLE_NAME = STRING
EXECUTION_DATE = DATETIME
MESSAGE = INTEGER

Has anybody else experienced this behavior and if yes, how did you solve it?

Thank you 🙂

cotopaul · ‎07-10-2023

@stevenmatison, @MattWho, @SAMSAL : have you ever encountered such a behavior? 😁

steven-matison · ‎07-10-2023

@cotopaul This may be worthy of a JIRA and a fix. PutBigQueryBatch is going away, so if there is some function missing in PutBigQuery we need to get it in there. I just did some work on PutBigQuery, let me have a discussion with @davidh and I will get back to this asap. If you can, share an example flow.

cotopaul · ‎07-10-2023

@steven-matisonthanks for you answer:)

You can download a template here: download

Instead of GenerateFlowFile, I have another processing section, but nevertheless, the relevant part starts with AttributesToJson going up until the PutBigQuery Processors 🙂

Cloudera Community

Support Questions

PutBigQuery inserts null, whereas PutBigQueryBatch inserts the actual data in same BigQueryTable