Support Questions
Find answers, ask questions, and share your expertise

How can we build a SOLR Schema für SYSLOGS with sorting and searching functionality?


we store all of out system syslog files in our Data Lake.
For realtime Monitoring we have to store the syslogs with SOLR.
The syslog structure is easy:
timestamp, hostname, message
time | host | message
2017-08-08 19:39:48 | | systemd[4554]: Stopped target Default.
2017-08-09 09:54:57 || systemd[28096]: Starting Shutdown.
2017-08-09 14:48:39 | | systemd[22015]: Stopping Timers.
2017-08-09 12:12:37 || systemd[1]: Started User Manager for UID 0.

2017-08-10 00:00:37 | | systemd[15736]: Stopping Basic System.

How can we build a schema in SOLR with thew possibility to search in all fields and do sorting the output by time, host or message?

Thanks for you help.




Hi @Timo Burmeister

You may find this tutorial helpful to follow.

Hi @Sonu Sahi

Unfortunately the tutorial does not help me in this case.
Using the Sample schemaless configuration uses creates multiValue filelds.

In addition we ingest the document by nifi with PutSolrContentStream

The sorting of the fields does not work like this.
So we got this error: can not sort on multivalued field: event_timestamp




@Timo Burmeister

Y ou can sort on any field that is single-valued (i.e., not tokenized -- unless it uses an analyzer that produces a single term -- or multiValued) and is indexed. So text and text_* fields are right out for sorting.

See Tokenizers and Filters

Hi @Geoffrey Shelton Okot,

my intension is to search all message sorted by timestamp or host.
so, how can i create a schema (HDP Search / Solr) with indexed single-valued fields (event_timestamp, host).

Sorting of messages is not necessary.

Best regards

Super Collaborator

Hi @Timo Burmeister,

the file can be uploaded with periodic intervals using CSV indexed handler.

curl 'http://<solr_host>:<port>/solr/<collection_name>/update/csv?commit=true&separator=%7c&fieldnames=timestamp,hostname,messagetime,host,message' --data-binary @/var/log/mylogmessages

Once thats done, with another REST call can be used to query the data.

more on this can be found at Solr Apache docs.

Hi @bkosaraju,

we need a realtime ingest.
The syslog messages comes from hundreds of host ingested by Nifi ListenSyslog over PutSolrContentStream.

Super Collaborator

Hi @Timo Burmeister,

NiFi, Great, you can convert the flowfile into json format ind ingest using PutSolrContentStream with real time ingestion.

Content-Type: JSON

Hi @bkosaraju,

thats what what I#m trying!
i got the string from ListenSyslog and convert the string to json.
{"event_timestamp":"2017-08-22 11:15:06","host":"","message":"sshd[9411]: debug1: PAM: setting PAM_RHOST to"}

The ingest runs fine, but a can't do sorting.
We got this error: can not sort on multivalued field: event_timestamp.

that's wy im asking for a solution to create a schema (HDP Search / Solr) with indexed single-valued fields (event_timestamp, host).


Super Collaborator

Hi @Timo Burmeister

If you were not provide any filed mapping in PutSolrContentStream, I presume thats the reason for multi values, as all the fields will be considered for multivalued fields by default.

as a resolution you can provide filed mapping for each of the json field processor.

User Defied properties for the processor should be

split: / 
f.1: event_timestamp_t:/event_timestamp
f.2: host_t:/host
f.3: message_t:/message

_t hints the solr to treat the filed as text field

on the side note following are the type specifications for the solr in case need.

_i  int   
_s  string
_l  long  
_t  text  
_b  boolean
_f  float 
_d  double
_dt date  

just for testing you may test with below sort method to sort the multi valued fields.


I request change the collection and try on new collection to ensure that it is not using the previous index, which build with multi valued fields..

Hi @bkosaraju,

i have some problems configuring filed mapping for each of the json fields.

This is the json file structure :

"event_timestamp":"${time:substring(0,20):toDate('yyyy MMM dd HH:mm:ss'):format('yyyy-MM-dd HH:mm:ss')}",


"message":"${message:substring(21):substringAfter(' '):replace('"', '\\\\"')}"


this is my PutSolrContentStream config:

Indexing failed ...


PutSolrStreamContent config

Hi @bkosaraju,

sorry, was my fault!
Solution runs fine !!!
My input data was wrong!

Thanks, great solution.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.