Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PutDatabaseRecord truncates microseconds from timestamps

avatar
New Contributor

I am doing an ETL project that is taking data from CSV files and loading into a postgresql database. The files can contain up to 1M rows, and I need to process many files.

Using NiFI version 1.15.2.
The flow that I have built can be simplified into "Fetch CSV" -> "Convert to Avro" -> "PutDatabaseRecord".


But I am having an issue where any timestamp columns that have microsecond precision are being truncated to seconds when inserted into the database. The project's requirements are for no data loss, so losing the microseconds will fail the requirement.

I have tried using both inferred and explicit schemas in the controller services, but both methods truncate the timestamps.

 

I created a test flow and database to explore the issue. 
The database has columns for storing the timestamp in string, timestamp and as an epoch number.

SimonM_1-1666659727852.png

 

My schema for the controller service

{
"name": "timestampTest",
"namespace": "nifi",
"type": "record",
"fields": [
{"name": "bom_stn_num", "type": "string"},
{"name": "timestamp_string", "type": "string"},
{"name": "timestamp_ts", "type": {"type":"long",
                                  "logicalType":"timestamp-micros" }
                                 },
{"name": "epoch_num", "type": "string"}
]
}

 

My input file (pipe delimited)

"bom_stn_num"|"timestamp_string"|"timestamp_ts"|"epoch_num"
81141|2021-12-21 02:01:10.124455|2021-12-21 02:01:11.124468|2021-12-21 02:01:11.124468

 

This is inserted into the table with the timestamp microsecond precision truncated.

SimonM_0-1666659665708.png

 

 

I found it interesting that when looking at the input claim for the PutDatabaseRecord shows the following record. The timestamp column appears to have too many digits for a typical epoch number. (Note that my flow has an intermediate UpdateRecord processor to convert the epoch column to numeric).

[ {
"bom_stn_num" : "81141",
"timestamp_string" : "2021-12-21 02:01:10.124455",
"timestamp_ts" : 1640052071000000,
"epoch_num" : "1640052195468"
} ]

 

Additionally, I tried with a flow that used only csv reader service (i.e. "Fetch File -> "PutDatabaseRecord") and got the same issue.

 

Any assistance with this would be much appreciated.

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

For your attempt to use only a CSVReader, have you configured the "timestamp" property in the controller service to describe a format that includes microseconds?

 

I have encountered a very similar issue in the past. In my case, I was writing avro files using PutDatabaseRecord which already had their schema as part of the file. The schema itself was incorrect and would describe date fields as strings - as such, when I'd write to my DB (which had a date with milliseconds percision type), the milliseconds would get cut out and presumably some default parsing would allow the rest of the date to be written correctly.

 

I the solution I found was to turn my avro files into CSV, then use a CSVReader in my PutDatabaseRecord processor. In my case, configuring the timestamp format to include milliseconds ( .SSS in the end) would end up allowing the processor to write the records to the DB correctly without any data loss.

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

For your attempt to use only a CSVReader, have you configured the "timestamp" property in the controller service to describe a format that includes microseconds?

 

I have encountered a very similar issue in the past. In my case, I was writing avro files using PutDatabaseRecord which already had their schema as part of the file. The schema itself was incorrect and would describe date fields as strings - as such, when I'd write to my DB (which had a date with milliseconds percision type), the milliseconds would get cut out and presumably some default parsing would allow the rest of the date to be written correctly.

 

I the solution I found was to turn my avro files into CSV, then use a CSVReader in my PutDatabaseRecord processor. In my case, configuring the timestamp format to include milliseconds ( .SSS in the end) would end up allowing the processor to write the records to the DB correctly without any data loss.

avatar
New Contributor

Hi Green_,

Yes, we had originally tried with the csv schema stating microseconds (.SSSSSS) without the conversion to Avro, this also wasn't loading correctly into the db unfortunately.
What we have had to do is to write a custom reader service to essentially force a timestamp through the PutDatabaseRecord processor.

Thank you for your reply.

 

avatar
Contributor

@SimonM 

I had the same issue can you please share the details of how you resolved it?

avatar
Community Manager

@SimonM Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Contributor

@Green_ 
I had the same issue can you please share the detailed steps of how you resolved it?
As I'm new to NiFi.