Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PutDatabaseRecord truncates microseconds from timestamps

avatar
New Contributor

I am doing an ETL project that is taking data from CSV files and loading into a postgresql database. The files can contain up to 1M rows, and I need to process many files.

Using NiFI version 1.15.2.
The flow that I have built can be simplified into "Fetch CSV" -> "Convert to Avro" -> "PutDatabaseRecord".


But I am having an issue where any timestamp columns that have microsecond precision are being truncated to seconds when inserted into the database. The project's requirements are for no data loss, so losing the microseconds will fail the requirement.

I have tried using both inferred and explicit schemas in the controller services, but both methods truncate the timestamps.

 

I created a test flow and database to explore the issue. 
The database has columns for storing the timestamp in string, timestamp and as an epoch number.

SimonM_1-1666659727852.png

 

My schema for the controller service

{
"name": "timestampTest",
"namespace": "nifi",
"type": "record",
"fields": [
{"name": "bom_stn_num", "type": "string"},
{"name": "timestamp_string", "type": "string"},
{"name": "timestamp_ts", "type": {"type":"long",
                                  "logicalType":"timestamp-micros" }
                                 },
{"name": "epoch_num", "type": "string"}
]
}

 

My input file (pipe delimited)

"bom_stn_num"|"timestamp_string"|"timestamp_ts"|"epoch_num"
81141|2021-12-21 02:01:10.124455|2021-12-21 02:01:11.124468|2021-12-21 02:01:11.124468

 

This is inserted into the table with the timestamp microsecond precision truncated.

SimonM_0-1666659665708.png

 

 

I found it interesting that when looking at the input claim for the PutDatabaseRecord shows the following record. The timestamp column appears to have too many digits for a typical epoch number. (Note that my flow has an intermediate UpdateRecord processor to convert the epoch column to numeric).

[ {
"bom_stn_num" : "81141",
"timestamp_string" : "2021-12-21 02:01:10.124455",
"timestamp_ts" : 1640052071000000,
"epoch_num" : "1640052195468"
} ]

 

Additionally, I tried with a flow that used only csv reader service (i.e. "Fetch File -> "PutDatabaseRecord") and got the same issue.

 

Any assistance with this would be much appreciated.

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
New Contributor

Hi Green_,

Yes, we had originally tried with the csv schema stating microseconds (.SSSSSS) without the conversion to Avro, this also wasn't loading correctly into the db unfortunately.
What we have had to do is to write a custom reader service to essentially force a timestamp through the PutDatabaseRecord processor.

Thank you for your reply.

 

avatar
Explorer

@SimonM 

I had the same issue can you please share the details of how you resolved it?

avatar
Community Manager

@SimonM Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: