About ifarkas

ifarkas · ‎05-10-2019

Hi, You can go to Cloudera Manager -> Solr -> Configuration, and there you can change the following parameters: Solr HTTP Port (default 8983) Solr Admin Port (default 8984) Solr HTTPs Port (default 8985) Similarly for Zookeeper, you can change the client port (where Solr will connect to, default 2181), and also the Quorum Port, Election Port and JMX Remote Port if they are also conflicting with outher services. After you save the changes, please go to the Cloudera Manager home page, you will see blue icons near the service names indicating the need of redeploying client configuration and orange icons indicating the need to restart affected services. For example, the Solr client configuration variable ZK_QUORUM and the environment variable SOLR_ZK_ENSEMBLE needs to be updated to reflect the new client port - this is all done by CM when redeploying client configs. You can do so either by following the blue icon(s) or by selecting 'Deploy Client Configurations' from the dropdown triangle to the right of the cluster name. You will also need to restart the affected services. Best Regards, Istvan

ifarkas · ‎05-08-2019

Hello Oliver, Yes, I think it is possible. You can set up a pipeline for example with Flume where you create a TailDirSource to ingest data from the log directory, then channel it to a MorphlineSolrSink where you can transform it Solr records. In your morphline script you can use grok commands to parse the log entries. I'm not aware of any out of the box scripts for CDH for parsing log files, but we have a blog entry which describes an example of processing syslog files, and you can also use the grok constructor app (https://grokconstructor.appspot.com) which is very helpful to create required grok expressions. Please note that the Flume sources like TailDirSource usually do not support multiline inputs (which would be handy for stack traces). The Flume source will process each line of the input file as a separate Flume event and the Morphlines will be invoked separately for each of those - even if we have a readMultiLine command in Morphlines, that is not applicable here since one invocation gets only a single line as input. I found this github repo which implements a multi-line flume source: https://github.com/qwurey/flume-source-multiline I did not try this recently but for example you can try this in your flume config: a3.sources.r3.type=com.urey.flume.MultiLineExecSource a3.sources.r3.lineStartRegex = \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d a3.sources.r3.command = tail -F /tmp/testtaildir/mylog.log And for example this expression in your morphlines: { readMultiLine { regex : "(^.+Exception: .+)|(^\\s+at .+)|(^\\s+\\.\\.\\. \\d+ more)|(^\\s*Caused by:.+)" negate: false what : previous charset : UTF-8 } } If you want batch indexing instead of the Near-Real-Time, you can use the MapReduceIndexerTool or the Spark Crunch Indexer instead of Flume, they also work using Morphlines. Best Regards, Istvan

Online	Offline
Last Visited	‎10-22-2021 06:21 AM

Member Since	‎10-21-2018 06:03 AM
Last Visited	‎10-22-2021 06:21 AM
Posts	10
Kudos received	4

Cloudera Community

Re: Changing port on Solr and Zookeeper in CDH 5.1...

Re: Solr - Cloudera Manager Logs?

Re: Changing port on Solr and Zookeeper in CDH 5.1...

Re: Solr - Cloudera Manager Logs?