- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
PutDatabaseRecord truncates microseconds from timestamps
- Labels:
-
Apache NiFi
Created 10-24-2022 06:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am doing an ETL project that is taking data from CSV files and loading into a postgresql database. The files can contain up to 1M rows, and I need to process many files.
Using NiFI version 1.15.2.
The flow that I have built can be simplified into "Fetch CSV" -> "Convert to Avro" -> "PutDatabaseRecord".
But I am having an issue where any timestamp columns that have microsecond precision are being truncated to seconds when inserted into the database. The project's requirements are for no data loss, so losing the microseconds will fail the requirement.
I have tried using both inferred and explicit schemas in the controller services, but both methods truncate the timestamps.
I created a test flow and database to explore the issue.
The database has columns for storing the timestamp in string, timestamp and as an epoch number.
My schema for the controller service
{
"name": "timestampTest",
"namespace": "nifi",
"type": "record",
"fields": [
{"name": "bom_stn_num", "type": "string"},
{"name": "timestamp_string", "type": "string"},
{"name": "timestamp_ts", "type": {"type":"long",
"logicalType":"timestamp-micros" }
},
{"name": "epoch_num", "type": "string"}
]
}
My input file (pipe delimited)
"bom_stn_num"|"timestamp_string"|"timestamp_ts"|"epoch_num"
81141|2021-12-21 02:01:10.124455|2021-12-21 02:01:11.124468|2021-12-21 02:01:11.124468
This is inserted into the table with the timestamp microsecond precision truncated.
I found it interesting that when looking at the input claim for the PutDatabaseRecord shows the following record. The timestamp column appears to have too many digits for a typical epoch number. (Note that my flow has an intermediate UpdateRecord processor to convert the epoch column to numeric).
[ {
"bom_stn_num" : "81141",
"timestamp_string" : "2021-12-21 02:01:10.124455",
"timestamp_ts" : 1640052071000000,
"epoch_num" : "1640052195468"
} ]
Additionally, I tried with a flow that used only csv reader service (i.e. "Fetch File -> "PutDatabaseRecord") and got the same issue.
Any assistance with this would be much appreciated.
Created 11-15-2022 05:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For your attempt to use only a CSVReader, have you configured the "timestamp" property in the controller service to describe a format that includes microseconds?
I have encountered a very similar issue in the past. In my case, I was writing avro files using PutDatabaseRecord which already had their schema as part of the file. The schema itself was incorrect and would describe date fields as strings - as such, when I'd write to my DB (which had a date with milliseconds percision type), the milliseconds would get cut out and presumably some default parsing would allow the rest of the date to be written correctly.
I the solution I found was to turn my avro files into CSV, then use a CSVReader in my PutDatabaseRecord processor. In my case, configuring the timestamp format to include milliseconds ( .SSS in the end) would end up allowing the processor to write the records to the DB correctly without any data loss.
Created 11-15-2022 05:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For your attempt to use only a CSVReader, have you configured the "timestamp" property in the controller service to describe a format that includes microseconds?
I have encountered a very similar issue in the past. In my case, I was writing avro files using PutDatabaseRecord which already had their schema as part of the file. The schema itself was incorrect and would describe date fields as strings - as such, when I'd write to my DB (which had a date with milliseconds percision type), the milliseconds would get cut out and presumably some default parsing would allow the rest of the date to be written correctly.
I the solution I found was to turn my avro files into CSV, then use a CSVReader in my PutDatabaseRecord processor. In my case, configuring the timestamp format to include milliseconds ( .SSS in the end) would end up allowing the processor to write the records to the DB correctly without any data loss.
Created 11-21-2022 09:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Green_,
Yes, we had originally tried with the csv schema stating microseconds (.SSSSSS) without the conversion to Avro, this also wasn't loading correctly into the db unfortunately.
What we have had to do is to write a custom reader service to essentially force a timestamp through the PutDatabaseRecord processor.
Thank you for your reply.
Created 07-17-2024 07:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had the same issue can you please share the details of how you resolved it?
Created 11-21-2022 03:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@SimonM Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks
Regards,
Diana Torres,Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 07-31-2024 11:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Green_
I had the same issue can you please share the detailed steps of how you resolved it?
As I'm new to NiFi.
