Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Issues with Date handling in NiFi

avatar
Contributor

In NiFi I'm parsing Apache / Common Log format files with SplitText > ExtractText > ReplaceText to create a JSON object. The object essentially looks like this:

{
"remote_host": "${csv.1}",
"remote_logname": "${csv.2}",
"remote_user": "${csv.3}",
"date_time": <date/time from formatted csv.4 goes here>,
"request": "${csv.5}",
"last_status": "${csv.6}",
"bytes_sent": "${csv.7}",
"user_agent": "${csv.8}"
}

From there, ConvertJSONtoSQL > PutSQL (> Snowflake table) works great EXCEPT for the date_time field. To be valid JSON, it needs to be wrapped in quotes, which allows it to pass through ConvertJSONtoSQL, but returns an Error when reaching PutSQL:

'... nifi.processor.exception.ProcessException: The value of the sql.args.4 value is '< whatever date format I happen to be trying that appears to match db's TIMESTAMP_LTZ(9) >', which cannot be converted into a necessary data type; ...'

A few examples of what I'm talking about below. If I ignore trying to format my csv.4 value and just use the now() method in it's place I've tried:

"date_time": "${now():format('EEE, d MMM yyyy HH:mm:ss Z')}"		[quoted] 

Above value passes JSON, but fails on PutSQL with an error like show above -- I believe due to the quotes (...?) since they do not seem to match the column type according to NiFi.

If I try:

"date_time": ${now():format('EEE, d MMM yyyy HH:mm:ss Z')}		[unquoted]

... it fails at the ConvertJSONtoSQL step.

One thing I was thinking is somehow modifying the values that go into the PutSQL after the ConvertJSONtoSQL step. Very new to NiFi so unsure whether this is possible with the standard / default processes (i.e., without make a custom process).

Ideally, I could pass something like the following / below, unquoted (!!!), as the date_time value and deal with it in the db rather than NiFi ...

"date_time": TO_TIMESTAMP ('${csv.4}', 'DD/MON/YYYY:HH24:MI:SS TZHTZM'),

... but that's certainly not valid JSON either and does not match the column type according to NiFi (... it works using the Snowflake client directly though).

Any help appreciated!

1 ACCEPTED SOLUTION

avatar
Master Guru

With the first ReplaceText you could leave yourself a marker like '@@DATE_HERE@@' as the date_time value, then as your intuition suggests, you can have another ReplaceText after ConvertJSONtoSQL to match the marker and replace it with the Expression Language statement, thereby removing the quotes by matching them with the marker and replacing with the correct literal.

View solution in original post

3 REPLIES 3

avatar
Master Guru

With the first ReplaceText you could leave yourself a marker like '@@DATE_HERE@@' as the date_time value, then as your intuition suggests, you can have another ReplaceText after ConvertJSONtoSQL to match the marker and replace it with the Expression Language statement, thereby removing the quotes by matching them with the marker and replacing with the correct literal.

avatar
Contributor

Hey @Matt Burgess, thank you for your reply! I was actually thinking something along those lines, but ended up at what I believe is a cleaner solution. I was basically able to omit ConvertJSONtoSQL and include my base query text in the ReplaceText process. This is where I was previously constructing the JSON object, but instead of doing that, I create my query there (i.e., in ReplaceText) and pass in values from the earlier ExtractText process. This will also enable me to handle a join easily because I can just add the necessary join statement at this point as well. From there, I just connect ReplaceText directly to PutSQL and it works great.

Nonetheless, thanks for the idea and I'm sure I will employ that technique somewhere in the future.

avatar
Contributor
@Matt Burgess Btw, I'm accepting your answer as the answer to my initial query, which was about how to handle the date issue. I believe your answer addresses that most directly by replacing downstream with a literal in ReplaceText.