Member since
10-23-2022
2
Posts
0
Kudos Received
0
Solutions
11-21-2022
09:28 PM
Hi Green_, Yes, we had originally tried with the csv schema stating microseconds (.SSSSSS) without the conversion to Avro, this also wasn't loading correctly into the db unfortunately. What we have had to do is to write a custom reader service to essentially force a timestamp through the PutDatabaseRecord processor. Thank you for your reply.
... View more
10-24-2022
06:20 PM
I am doing an ETL project that is taking data from CSV files and loading into a postgresql database. The files can contain up to 1M rows, and I need to process many files. Using NiFI version 1.15.2. The flow that I have built can be simplified into "Fetch CSV" -> "Convert to Avro" -> "PutDatabaseRecord". But I am having an issue where any timestamp columns that have microsecond precision are being truncated to seconds when inserted into the database. The project's requirements are for no data loss, so losing the microseconds will fail the requirement. I have tried using both inferred and explicit schemas in the controller services, but both methods truncate the timestamps. I created a test flow and database to explore the issue. The database has columns for storing the timestamp in string, timestamp and as an epoch number. My schema for the controller service { "name": "timestampTest", "namespace": "nifi", "type": "record", "fields": [ {"name": "bom_stn_num", "type": "string"}, {"name": "timestamp_string", "type": "string"}, {"name": "timestamp_ts", "type": {"type":"long", "logicalType":"timestamp-micros" } }, {"name": "epoch_num", "type": "string"} ] } My input file (pipe delimited) "bom_stn_num"|"timestamp_string"|"timestamp_ts"|"epoch_num" 81141|2021-12-21 02:01:10.124455|2021-12-21 02:01:11.124468|2021-12-21 02:01:11.124468 This is inserted into the table with the timestamp microsecond precision truncated. I found it interesting that when looking at the input claim for the PutDatabaseRecord shows the following record. The timestamp column appears to have too many digits for a typical epoch number. (Note that my flow has an intermediate UpdateRecord processor to convert the epoch column to numeric). [ { "bom_stn_num" : "81141", "timestamp_string" : "2021-12-21 02:01:10.124455", "timestamp_ts" : 1640052071000000, "epoch_num" : "1640052195468" } ] Additionally, I tried with a flow that used only csv reader service (i.e. "Fetch File -> "PutDatabaseRecord") and got the same issue. Any assistance with this would be much appreciated.
... View more
Labels:
- Labels:
-
Apache NiFi