Created on 07-07-2023 07:05 AM - edited 07-07-2023 07:15 AM
Hi guys,
Please help me out with a strange behavior when using PutBigQuery.
I am using Apache NiFi 1.19.1
So, my flow is as follows:
Step a:
I have a GenerateTableFetch and an ExecuteSQLRecord, which extracts some data out of Database.
Step b:
The data gets loaded into a GCS Bucket, using PutGCSOBject.
Step c:
When the data has been saved into the GCS Bucket, I have an UpdateAttribute Processor, linked to the success queue.
Within this UpdateAttribute Processor, I have defined the following 3 attributes:
TABLE_NAME = ${generatetablefetch.tableName:toUpper()}
EXECUTION_DATE = ${now():toNumber()}
MESSAGE = 1
Step d:
The success queue is linked afterwards to an AttributesToJSON Processor.
I have modified the properties as follows:
Destination = flowfile-content
Attributes List = TABLE_NAME, EXECUTION_DATE, MESSAGE
Step e:
Via success, I link to an ConvertRecord, where I change from JSON to AVRO.
The JSON Reader and the AVRO Writter are both defined with the following schema:
{
"namespace": "example.avro",
"type": "record",
"name": "DOMAIN.LOGGING_STATUS_EXECUTION",
"fields": [
{
"name": "TABLE_NAME",
"type": "string"
},
{
"name": "EXECUTION_DATE",
"type": [
"null",
{
"type": "long",
"logicalType": "local-timestamp-millis"
}
]
},
{
"name": "MESSAGE",
"type": "int"
}
]
}
Step f:
First test would be with PutBigQueryBatch.
I have defined my Dataset, my Table Name, Load File Type = AVRO, Create Disposition = CREATE_IF_NEEDED and Write Disposition = WRITE_APPEND.
When executing the processor on the AVRO File (from step e), the data gets loaded correctly into my BigQuery Table.
My second test would be with PutBigQuery.
I have defined my Dataset, my Table Name, the Record Reader as an AVRO Reader using the embedded AVRO Schema and Transfer Type = BATCH.
When executing the processor on the AVRO File (from step e), the data gets loaded into my BigQuery Table, but all the values are NULL ... and no matter how much I wait, it remains NULL.
Here is a screenshot of how the data looks, in the same table, where row 1 = PutBigQuery and row 2 = PutBigQueryBatch, using the same flow on the same data.
The table has the following column-data types and it has not partitioning.
TABLE_NAME = STRING
EXECUTION_DATE = DATETIME
MESSAGE = INTEGER
Has anybody else experienced this behavior and if yes, how did you solve it?
Thank you 🙂
Created 07-10-2023 12:15 AM
@stevenmatison, @MattWho, @SAMSAL : have you ever encountered such a behavior? 😁
Created 07-10-2023 05:32 AM
@cotopaul This may be worthy of a JIRA and a fix. PutBigQueryBatch is going away, so if there is some function missing in PutBigQuery we need to get it in there. I just did some work on PutBigQuery, let me have a discussion with @davidh and I will get back to this asap. If you can, share an example flow.
Created 07-10-2023 08:44 AM
@steven-matisonthanks for you answer:)
You can download a template here: download
Instead of GenerateFlowFile, I have another processing section, but nevertheless, the relevant part starts with AttributesToJson going up until the PutBigQuery Processors 🙂