Created 04-11-2023 03:03 AM
Hi there,
So, I need your experience with a flow I have been struggling with.
My Flow is as follows: ConsumeKafka --> ConvertRecord --> ConvertJSONToSQL --> PutSQL.
I am extracting some data out of some kafka brokers, which come in the following format:
{"Column1": "1e39c17d25cb420e8e720aa4b1ae005e","TimeStamp": "2023-04-10T10:43:15.794241429+03:00","Column3": "some_string","Column4": false,"Column5": "some_string","Column6": "some_string","Column7": false,"Column8": "some_string","Column9": "","Column10": 0,"Column11": 0,"Column12": "","Column13": 0,"Column14": 137,"Column15": "","Column16": 0,"Column17": 0,"Column18": "","Column19": 138,"Column20": 1550,"Column21": "some_string","Column22": ""}
I am using afterwards an ConverRecord, with an RecordReader = JsonTreeReader (with default configuration: Infer Schema and Root Node) and an RecordWriter = JsonRecordSetWriter (with default configuration: Inherit Record Schema and Pretty print json = false). The data comes out in JSON format, as expected, without any modifications.
Next, I am going into a ConvertJSONToSQL. Here, I have defined a JDBC Connection Pool, set the Statement Type = INSERT and set the Table Name = my_table. All the other configurations remained the same. The data comes out as attributes and they look fine.
sql.args.2.type --> 93
sql.args.2.value --> 2023-04-10T10:43:15.794241429+03:00
Now, due to the fact that I have to add the data in a PostgreSQL Database, I am using PutSQL to save the data. As the column I am inserting that value into is of format
"timestamp without time zone", I have added one UpdateAttribute Processors to modify the date.
The first UpdateAttribute is defined with a property named "sql.args.2.value" and has the following value "${sql.args.2.value:substringBefore('T')} ${sql.args.2.value:substringAfter('T'):substringBefore('+')}". The data which now comes out is: " 2023-04-10 10:43:15.794241429". When inserting the data into PostgreSQL, I have a different value as the one present into the attribute: "2023-04-19 15:20:36.429".
To further debug the flow, I have added another UpdateAttribute and defined it with a property named "a3" and has the following value ${sql.args.2.value:toDate("yyyy-MM-dd HH:mm:ss.SSS"):toNumber():format("yyyy-MM-dd HH:mm:ss.SSS")}. I have added another property as well, named "a4" and has the following value "${sql.args.2.value:toDate("yyyy-MM-dd HH:mm:ss.SSS"):toNumber()}". Now, the results are pretty strange:
a3 --> 2023-04-19 15:20:36.429
a4 --> 1681906836429.
Basically the problems comes when translating that date into the datetime type. Does anybody know why it behaves like this? I am certain that this is not a bug, but something I am not using correctly 🙂
Thank you 🙂
Created 04-12-2023 07:09 AM
@cotopaul
Your issue is caused by trying to use the ":toDate" and ":format" functions here. These two functions utilize the Java SimpleDateFormat which can not handle your date which uses nanoseconds.
You'll want to use a NiFi Expression Language function that supports the Java DateTimeFormatter like ":formatInstant", "toInstant", ":toNanos", or ":toMicros".
Using your example:
sql.args.2.value = 2023-04-10T10:43:15.794241429+03:00
and an updateAttribute processor configured with this new dynamic property:
newformat = ${sql.args.2.value:formatInstant('yyyy-MM-dd HH:mm:ss.SSS','GMT')}
Will result in the newformat FlowFile attribute with a value of:
2023-04-10 07:43:15.794
You'll notice the output was designated as being GMT so 3 hours were removed since you had a +03:00 GMT/UTC offset in your original time format.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt
Created 04-12-2023 02:55 AM
@MattWho, @steven-matison: would really appreciate your input as I am struggling with this and I do not know how to solve it, or what to further check and try 😞
I tried to replace ConvertRecord with an UpdateRecord where I have tried updating my column /TimeStamp using the EL ${field.value:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSXXX"):format('yyyy-MM-dd HH:mm:ss.SSS')}. Unfortunately, the same result, a new datetime is generated 😞
Created 04-12-2023 07:09 AM
@cotopaul
Your issue is caused by trying to use the ":toDate" and ":format" functions here. These two functions utilize the Java SimpleDateFormat which can not handle your date which uses nanoseconds.
You'll want to use a NiFi Expression Language function that supports the Java DateTimeFormatter like ":formatInstant", "toInstant", ":toNanos", or ":toMicros".
Using your example:
sql.args.2.value = 2023-04-10T10:43:15.794241429+03:00
and an updateAttribute processor configured with this new dynamic property:
newformat = ${sql.args.2.value:formatInstant('yyyy-MM-dd HH:mm:ss.SSS','GMT')}
Will result in the newformat FlowFile attribute with a value of:
2023-04-10 07:43:15.794
You'll notice the output was designated as being GMT so 3 hours were removed since you had a +03:00 GMT/UTC offset in your original time format.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt
Created 04-12-2023 07:58 AM
Thank you @MattWho, it worked like a charm. You are a life saver 🙂
I did not even consider the nanoseconds and I did not really knew about EL functions for the Java DateTimeFormatter. Neverthless, if somebody else encounters a similar issue, here is the link to the documentation --> here.
One more question though, if possible. When saving the data into the postgresql database, using PutDatabaseRecord (JSON as Reader) , the value "2023-04-10 07:43:15.794" gets immediately truncated to "2023-04-10 07:43:15" --> basically it removed everything after the point. In postgresql, the column is defined as "timestamp without time zone" with an precision of 6.