Created 10-24-2022 06:20 PM
I am doing an ETL project that is taking data from CSV files and loading into a postgresql database. The files can contain up to 1M rows, and I need to process many files.
Using NiFI version 1.15.2.
The flow that I have built can be simplified into "Fetch CSV" -> "Convert to Avro" -> "PutDatabaseRecord".
But I am having an issue where any timestamp columns that have microsecond precision are being truncated to seconds when inserted into the database. The project's requirements are for no data loss, so losing the microseconds will fail the requirement.
I have tried using both inferred and explicit schemas in the controller services, but both methods truncate the timestamps.
I created a test flow and database to explore the issue.
The database has columns for storing the timestamp in string, timestamp and as an epoch number.
My schema for the controller service
{
"name": "timestampTest",
"namespace": "nifi",
"type": "record",
"fields": [
{"name": "bom_stn_num", "type": "string"},
{"name": "timestamp_string", "type": "string"},
{"name": "timestamp_ts", "type": {"type":"long",
"logicalType":"timestamp-micros" }
},
{"name": "epoch_num", "type": "string"}
]
}
My input file (pipe delimited)
"bom_stn_num"|"timestamp_string"|"timestamp_ts"|"epoch_num"
81141|2021-12-21 02:01:10.124455|2021-12-21 02:01:11.124468|2021-12-21 02:01:11.124468
This is inserted into the table with the timestamp microsecond precision truncated.
I found it interesting that when looking at the input claim for the PutDatabaseRecord shows the following record. The timestamp column appears to have too many digits for a typical epoch number. (Note that my flow has an intermediate UpdateRecord processor to convert the epoch column to numeric).
[ {
"bom_stn_num" : "81141",
"timestamp_string" : "2021-12-21 02:01:10.124455",
"timestamp_ts" : 1640052071000000,
"epoch_num" : "1640052195468"
} ]
Additionally, I tried with a flow that used only csv reader service (i.e. "Fetch File -> "PutDatabaseRecord") and got the same issue.
Any assistance with this would be much appreciated.
Created 11-15-2022 05:46 PM
For your attempt to use only a CSVReader, have you configured the "timestamp" property in the controller service to describe a format that includes microseconds?
I have encountered a very similar issue in the past. In my case, I was writing avro files using PutDatabaseRecord which already had their schema as part of the file. The schema itself was incorrect and would describe date fields as strings - as such, when I'd write to my DB (which had a date with milliseconds percision type), the milliseconds would get cut out and presumably some default parsing would allow the rest of the date to be written correctly.
I the solution I found was to turn my avro files into CSV, then use a CSVReader in my PutDatabaseRecord processor. In my case, configuring the timestamp format to include milliseconds ( .SSS in the end) would end up allowing the processor to write the records to the DB correctly without any data loss.
Created 11-15-2022 05:46 PM
For your attempt to use only a CSVReader, have you configured the "timestamp" property in the controller service to describe a format that includes microseconds?
I have encountered a very similar issue in the past. In my case, I was writing avro files using PutDatabaseRecord which already had their schema as part of the file. The schema itself was incorrect and would describe date fields as strings - as such, when I'd write to my DB (which had a date with milliseconds percision type), the milliseconds would get cut out and presumably some default parsing would allow the rest of the date to be written correctly.
I the solution I found was to turn my avro files into CSV, then use a CSVReader in my PutDatabaseRecord processor. In my case, configuring the timestamp format to include milliseconds ( .SSS in the end) would end up allowing the processor to write the records to the DB correctly without any data loss.
Created 11-21-2022 09:28 PM
Hi Green_,
Yes, we had originally tried with the csv schema stating microseconds (.SSSSSS) without the conversion to Avro, this also wasn't loading correctly into the db unfortunately.
What we have had to do is to write a custom reader service to essentially force a timestamp through the PutDatabaseRecord processor.
Thank you for your reply.
Created 07-17-2024 07:37 PM
I had the same issue can you please share the details of how you resolved it?
Created 11-21-2022 03:07 PM
@SimonM Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks
Regards,
Diana Torres,Created 07-31-2024 11:38 AM
@Green_
I had the same issue can you please share the detailed steps of how you resolved it?
As I'm new to NiFi.