Created 08-21-2017 09:16 AM
Hi,
we store all of out system syslog files in our Data Lake.
For realtime Monitoring we have to store the syslogs with SOLR.
The syslog structure is easy:
example:
timestamp, hostname, message
time | host | message
---------------------+------------------------+----------------------------------------------
2017-08-08 19:39:48 | xxxx.xxx.com | systemd[4554]: Stopped target Default.
2017-08-09 09:54:57 | yxz.gsj.com| systemd[28096]: Starting Shutdown.
2017-08-09 14:48:39 | yxz.gsj.com | systemd[22015]: Stopping Timers.
2017-08-09 12:12:37 | yxz.gsj.com| systemd[1]: Started User Manager for UID 0.
2017-08-10 00:00:37 | xxxx.xxx.com | systemd[15736]: Stopping Basic System.
How can we build a schema in SOLR with thew possibility to search in all fields and do sorting the output by time, host or message?
Thanks for you help.
Timo
Created 08-21-2017 03:40 PM
You may find this tutorial helpful to follow. https://hortonworks.com/hadoop-tutorial/searching-data-solr/
Created 08-22-2017 08:28 AM
Hi @Sonu Sahi
Unfortunately the tutorial does not help me in this case.
Using the Sample schemaless configuration uses creates multiValue filelds.
In addition we ingest the document by nifi with PutSolrContentStream
The sorting of the fields does not work like this.
So we got this error: can not sort on multivalued field: event_timestamp
regards
Timo
Created 08-22-2017 08:44 AM
Y ou can sort on any field that is single-valued (i.e., not tokenized -- unless it uses an analyzer that produces a single term -- or multiValued) and is indexed. So text and text_* fields are right out for sorting.
Created 08-22-2017 08:55 AM
Hi @Geoffrey Shelton Okot,
my intension is to search all message sorted by timestamp or host.
so, how can i create a schema (HDP Search / Solr) with indexed single-valued fields (event_timestamp, host).
Sorting of messages is not necessary.
Best regards
Timo
Created 08-22-2017 09:05 AM
Hi @Timo Burmeister,
the file can be uploaded with periodic intervals using CSV indexed handler.
curl 'http://<solr_host>:<port>/solr/<collection_name>/update/csv?commit=true&separator=%7c&fieldnames=timestamp,hostname,messagetime,host,message' --data-binary @/var/log/mylogmessages
Once thats done, with another REST call can be used to query the data.
more on this can be found at Solr Apache docs.
Created 08-22-2017 09:17 AM
Hi @bkosaraju,
we need a realtime ingest.
The syslog messages comes from hundreds of host ingested by Nifi ListenSyslog over PutSolrContentStream.
regards
Timo
Created 08-22-2017 09:28 AM
Hi @Timo Burmeister,
NiFi, Great, you can convert the flowfile into json format ind ingest using PutSolrContentStream with real time ingestion.
Content-Type: JSON |
Created 08-22-2017 12:29 PM
Hi @bkosaraju,
thats what what I#m trying!
i got the string from ListenSyslog and convert the string to json.
example:
{"event_timestamp":"2017-08-22 11:15:06","host":"meyer.devcon.com","message":"sshd[9411]: debug1: PAM: setting PAM_RHOST to monitoring-portal.devcon1.com"}
The ingest runs fine, but a can't do sorting.
We got this error: can not sort on multivalued field: event_timestamp.
that's wy im asking for a solution to create a schema (HDP Search / Solr) with indexed single-valued fields (event_timestamp, host).
regards
Timo
Created 08-22-2017 02:38 PM
If you were not provide any filed mapping in PutSolrContentStream, I presume thats the reason for multi values, as all the fields will be considered for multivalued fields by default.
as a resolution you can provide filed mapping for each of the json field processor.
User Defied properties for the processor should be
split: /
f.1: event_timestamp_t:/event_timestamp
f.2: host_t:/host
f.3: message_t:/message
_t hints the solr to treat the filed as text field
on the side note following are the type specifications for the solr in case need.
_i int _s string _l long _t text _b boolean _f float _d double _dt date
just for testing you may test with below sort method to sort the multi valued fields.
sort=field(event_timestamp,min)+asc
I request change the collection and try on new collection to ensure that it is not using the previous index, which build with multi valued fields..
Created 08-23-2017 01:50 PM
Hi @bkosaraju,
i have some problems configuring filed mapping for each of the json fields.
This is the json file structure :
{
"event_timestamp":"${time:substring(0,20):toDate('yyyy MMM dd HH:mm:ss'):format('yyyy-MM-dd HH:mm:ss')}",
"host":"${message:substring(21):substringBefore('')}",
"message":"${message:substring(21):substringAfter(' '):replace('"', '\\\\"')}"
}
this is my PutSolrContentStream config:
Indexing failed ...
Created on 08-23-2017 01:56 PM - edited 08-17-2019 06:37 PM
PutSolrStreamContent config
Created 08-24-2017 07:57 AM
Hi @bkosaraju,
sorry, was my fault!
Solution runs fine !!!
My input data was wrong!
Thanks, great solution.
Timo