Created 08-02-2016 10:35 PM
In NiFi I'm parsing Apache / Common Log format files with SplitText > ExtractText > ReplaceText to create a JSON object. The object essentially looks like this:
{ "remote_host": "${csv.1}", "remote_logname": "${csv.2}", "remote_user": "${csv.3}", "date_time": <date/time from formatted csv.4 goes here>, "request": "${csv.5}", "last_status": "${csv.6}", "bytes_sent": "${csv.7}", "user_agent": "${csv.8}" }
From there, ConvertJSONtoSQL > PutSQL (> Snowflake table) works great EXCEPT for the date_time field. To be valid JSON, it needs to be wrapped in quotes, which allows it to pass through ConvertJSONtoSQL, but returns an Error when reaching PutSQL:
'... nifi.processor.exception.ProcessException: The value of the sql.args.4 value is '< whatever date format I happen to be trying that appears to match db's TIMESTAMP_LTZ(9) >', which cannot be converted into a necessary data type; ...'
A few examples of what I'm talking about below. If I ignore trying to format my csv.4 value and just use the now() method in it's place I've tried:
"date_time": "${now():format('EEE, d MMM yyyy HH:mm:ss Z')}" [quoted]
Above value passes JSON, but fails on PutSQL with an error like show above -- I believe due to the quotes (...?) since they do not seem to match the column type according to NiFi.
If I try:
"date_time": ${now():format('EEE, d MMM yyyy HH:mm:ss Z')} [unquoted]
... it fails at the ConvertJSONtoSQL step.
One thing I was thinking is somehow modifying the values that go into the PutSQL after the ConvertJSONtoSQL step. Very new to NiFi so unsure whether this is possible with the standard / default processes (i.e., without make a custom process).
Ideally, I could pass something like the following / below, unquoted (!!!), as the date_time value and deal with it in the db rather than NiFi ...
"date_time": TO_TIMESTAMP ('${csv.4}', 'DD/MON/YYYY:HH24:MI:SS TZHTZM'),
... but that's certainly not valid JSON either and does not match the column type according to NiFi (... it works using the Snowflake client directly though).
Any help appreciated!
Created 08-03-2016 12:45 AM
With the first ReplaceText you could leave yourself a marker like '@@DATE_HERE@@' as the date_time value, then as your intuition suggests, you can have another ReplaceText after ConvertJSONtoSQL to match the marker and replace it with the Expression Language statement, thereby removing the quotes by matching them with the marker and replacing with the correct literal.
Created 08-03-2016 12:45 AM
With the first ReplaceText you could leave yourself a marker like '@@DATE_HERE@@' as the date_time value, then as your intuition suggests, you can have another ReplaceText after ConvertJSONtoSQL to match the marker and replace it with the Expression Language statement, thereby removing the quotes by matching them with the marker and replacing with the correct literal.
Created 08-03-2016 05:57 PM
Hey @Matt Burgess, thank you for your reply! I was actually thinking something along those lines, but ended up at what I believe is a cleaner solution. I was basically able to omit ConvertJSONtoSQL and include my base query text in the ReplaceText process. This is where I was previously constructing the JSON object, but instead of doing that, I create my query there (i.e., in ReplaceText) and pass in values from the earlier ExtractText process. This will also enable me to handle a join easily because I can just add the necessary join statement at this point as well. From there, I just connect ReplaceText directly to PutSQL and it works great.
Nonetheless, thanks for the idea and I'm sure I will employ that technique somewhere in the future.
Created 08-03-2016 06:01 PM