Member since
03-24-2017
12
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9996 | 01-26-2018 01:01 AM |
01-28-2020
10:10 AM
Hi, There are two ip addresses after running ntpq -np, remote and refid, do you mean we should use remote ip? Thank you for sharing your knowledge!
... View more
05-08-2019
12:53 AM
2 Kudos
Hello Oliver, Yes, I think it is possible. You can set up a pipeline for example with Flume where you create a TailDirSource to ingest data from the log directory, then channel it to a MorphlineSolrSink where you can transform it Solr records. In your morphline script you can use grok commands to parse the log entries. I'm not aware of any out of the box scripts for CDH for parsing log files, but we have a blog entry which describes an example of processing syslog files, and you can also use the grok constructor app (https://grokconstructor.appspot.com) which is very helpful to create required grok expressions. Please note that the Flume sources like TailDirSource usually do not support multiline inputs (which would be handy for stack traces). The Flume source will process each line of the input file as a separate Flume event and the Morphlines will be invoked separately for each of those - even if we have a readMultiLine command in Morphlines, that is not applicable here since one invocation gets only a single line as input. I found this github repo which implements a multi-line flume source: https://github.com/qwurey/flume-source-multiline I did not try this recently but for example you can try this in your flume config: a3.sources.r3.type=com.urey.flume.MultiLineExecSource
a3.sources.r3.lineStartRegex = \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
a3.sources.r3.command = tail -F /tmp/testtaildir/mylog.log And for example this expression in your morphlines: {
readMultiLine {
regex : "(^.+Exception: .+)|(^\\s+at .+)|(^\\s+\\.\\.\\. \\d+ more)|(^\\s*Caused by:.+)"
negate: false
what : previous
charset : UTF-8
}
} If you want batch indexing instead of the Near-Real-Time, you can use the MapReduceIndexerTool or the Spark Crunch Indexer instead of Flume, they also work using Morphlines. Best Regards, Istvan
... View more
01-26-2018
01:01 AM
1 Kudo
the solution was: remove all of the repositories from CM and restart the cloudera-server-manager. the default repos are set at the start if no repos are present.
... View more
01-19-2018
09:13 AM
@DataCrunch, If you use Cloudera Manager to manage your cluster, you can use the charts for that host to monitor resources more closely. It is a great way of keeping an eye on who is taking up what IO, disk, memory, etc. https://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_view_charts.html
... View more