Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

New Contributor

Hi,

we store all of out system syslog files in our Data Lake.
For realtime Monitoring we have to store the syslogs with SOLR.
The syslog structure is easy:
example:
timestamp, hostname, message
time | host | message
---------------------+------------------------+----------------------------------------------
2017-08-08 19:39:48 | xxxx.xxx.com | systemd[4554]: Stopped target Default.
2017-08-09 09:54:57 | yxz.gsj.com| systemd[28096]: Starting Shutdown.
2017-08-09 14:48:39 | yxz.gsj.com | systemd[22015]: Stopping Timers.
2017-08-09 12:12:37 | yxz.gsj.com| systemd[1]: Started User Manager for UID 0.

2017-08-10 00:00:37 | xxxx.xxx.com | systemd[15736]: Stopping Basic System.


How can we build a schema in SOLR with thew possibility to search in all fields and do sorting the output by time, host or message?

Thanks for you help.

Timo

12 REPLIES 12

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

Guru

Hi @Timo Burmeister

You may find this tutorial helpful to follow. https://hortonworks.com/hadoop-tutorial/searching-data-solr/

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

New Contributor

Hi @Sonu Sahi

Unfortunately the tutorial does not help me in this case.
Using the Sample schemaless configuration uses creates multiValue filelds.

In addition we ingest the document by nifi with PutSolrContentStream

The sorting of the fields does not work like this.
So we got this error: can not sort on multivalued field: event_timestamp

regards

Timo

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

Mentor

@Timo Burmeister

Y ou can sort on any field that is single-valued (i.e., not tokenized -- unless it uses an analyzer that produces a single term -- or multiValued) and is indexed. So text and text_* fields are right out for sorting.

See Tokenizers and Filters

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

New Contributor

Hi @Geoffrey Shelton Okot,

my intension is to search all message sorted by timestamp or host.
so, how can i create a schema (HDP Search / Solr) with indexed single-valued fields (event_timestamp, host).

Sorting of messages is not necessary.

Best regards
Timo

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

Super Collaborator

Hi @Timo Burmeister,

the file can be uploaded with periodic intervals using CSV indexed handler.

curl 'http://<solr_host>:<port>/solr/<collection_name>/update/csv?commit=true&separator=%7c&fieldnames=timestamp,hostname,messagetime,host,message' --data-binary @/var/log/mylogmessages

Once thats done, with another REST call can be used to query the data.

more on this can be found at Solr Apache docs.

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

New Contributor

Hi @bkosaraju,

we need a realtime ingest.
The syslog messages comes from hundreds of host ingested by Nifi ListenSyslog over PutSolrContentStream.
regards
Timo

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

Super Collaborator

Hi @Timo Burmeister,

NiFi, Great, you can convert the flowfile into json format ind ingest using PutSolrContentStream with real time ingestion.

Content-Type: JSON

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

New Contributor

Hi @bkosaraju,

thats what what I#m trying!
i got the string from ListenSyslog and convert the string to json.
example:
{"event_timestamp":"2017-08-22 11:15:06","host":"meyer.devcon.com","message":"sshd[9411]: debug1: PAM: setting PAM_RHOST to monitoring-portal.devcon1.com"}

The ingest runs fine, but a can't do sorting.
We got this error: can not sort on multivalued field: event_timestamp.

that's wy im asking for a solution to create a schema (HDP Search / Solr) with indexed single-valued fields (event_timestamp, host).

regards
Timo

Highlighted

Re: How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?

Super Collaborator

Hi @Timo Burmeister

If you were not provide any filed mapping in PutSolrContentStream, I presume thats the reason for multi values, as all the fields will be considered for multivalued fields by default.

as a resolution you can provide filed mapping for each of the json field processor.

User Defied properties for the processor should be

split: / 
f.1: event_timestamp_t:/event_timestamp
f.2: host_t:/host
f.3: message_t:/message

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-solr-nar/1.3.0/org.apache.nif...

_t hints the solr to treat the filed as text field

on the side note following are the type specifications for the solr in case need.

_i  int   
_s  string
_l  long  
_t  text  
_b  boolean
_f  float 
_d  double
_dt date  

just for testing you may test with below sort method to sort the multi valued fields.

sort=field(event_timestamp,min)+asc 

I request change the collection and try on new collection to ensure that it is not using the previous index, which build with multi valued fields..

Don't have an account?
Coming from Hortonworks? Activate your account here