Created 04-20-2018 01:31 PM
Hi everyone!
In my FlowFile, using components, I listen to logs from Squid in the form of txt. Then I add columns and separate the log elements. The output file is csv, because in the future it will be added to the Hive database.
The problem occurs when I want to change the type of the "datetime" variable from unix timestamp to the usual date time.
Based on this solution: https://community.hortonworks.com/articles/131320/using-partitionrecord-grokreaderjsonwriter-to-pars...
This is how the log sent by Squid looks like using a proxy:
1518442283.483 161 127.0.0.1 TCP_MISS/200 103701 GET http://www.cnn.com/ matt DIRECT/199.27.79.73 text/html
This is Grok Expression (in GrokReader😞
%{NUMBER:timestamp}\s+%{NUMBER:duration}\s%{IP:client_address}\s%{WORD:cache_result}/%{POSINT:status_code}\s%{NUMBER:bytes}\s%{WORD:request_method}\s%{NOTSPACE:url}\s(%{NOTSPACE:user}|-)\s%{WORD:hierarchy_code}/%{IPORHOST:server}\s%{NOTSPACE:content_type}
The following links include:
→ General appearance of components: https://imgur.com/a/fn2Q03N
→ Settings for the UpdateRecord component: https://imgur.com/a/E8gfMj8
→ Settings for the UpdateRecord/GrokReader(nifi_mid): https://imgur.com/a/l8He75D
→ Settings for the UpdateRecord/CSVRecordSetWriter: https://imgur.com/a/aqiZ98M
My problem is described on stackoverflow: https://stackoverflow.com/questions/48885675/conversion-unix-timestamp-attribute-to-normal-date
Created 04-24-2018 10:54 AM
@Davide Isoardi where and how can I set these values (in this case user and datetime) to be able to transform them?
Created 04-24-2018 11:50 AM
in your place I would have:
Created 04-24-2018 11:49 AM
in your place I would have:
Created 04-24-2018 12:08 PM
The input file is in the text/html format, and the output file must be in csv format, because it will feed the database.