Created on 12-22-2015 07:55 AM - edited 09-16-2022 02:54 AM
Good Morning everyone
I've been trying to get the syslog->Solr example specified here . It is a fairly simple example using:
syslog source
memory channel
MorphlineSolr Sink
The flume configuration file I used is:
a1.sources = r1 a1.channels = c1 a1.sinks = k1 #source a1.sources.r1.type = syslogtcp a1.sources.r1.port = xxxx a1.sources.r1.host = xxx.xxx.xxx.xxx #sink a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink a1.sinks.k1.morphlineFile = /route/to/the/morphline.conf #channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 #a1.channels.c1.transactionCapacity = 10000 #connect a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
The morphline files I used was the same provided in the example above:
morphlines : [ { id : morphline1 importCommands : ["com.cloudera.**", "org.kitesdk.**", "org.apache.solr.**"] commands : [ { readLine { charset : UTF-8 } } { grok { dictionaryFiles : [/route/to/the/morphline/grok-dictonaries] expressions : { message : """<%{POSINT:priority}>%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}""" } } } { convertTimestamp { field : timestamp inputFormats : [ "yyyy-MM-dd'T'HH:mm:ss'Z'", "MMM d HH:mm:ss" ] inputTimezone : America/Bogota outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" outputTimezone : UTC } } { sanitizeUnknownSolrFields { # Location from which to fetch Solr schema solrLocator: { collection: syslogs zkHost: "zkEnsembleAddresses" } } } # log the record at INFO level to SLF4J { logInfo { format : "output record: {}", args : ["@{}"] } } { loadSolr { solrLocator : { collection: syslogs zkHost: "zkEnsembleAddresses" } } } ] } ]
The morphlines file specify the folowing chain of commands:
- read the line
- grok messages with an expression
- convert the timestamp
- filter unsanitized fields
- put the data in solr
However when I try to start the flume agent it always throws the error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
it always shows the same error after the message:
INFO api.MorphlineContext: Importing commands
flume never gets to start the solr sink. It seems that there is not enough memory in flume to start the sink. So I modified the /etc/flume-ng/conf/flume-env.sh file and uncommented the JAVA_OPTS LINE. the uncommented line was this:
export JAVA_OPTS="-Xms2048m -Xmx204800m -Dcom.sun.management.jmxremote"
Basically I was giving 2GB of starting heap space to java (a maximum limit of 200GB - the machines in the cluster have a lot of Memory). The error is still the same :(.
Then I modified the command line to start the flume agent trying to increase the java memory:
flume-ng agent -n a1 -f flume_config.conf -Dproperty="-Xms1024m -Xmx=204800m"
And the error still keeps appearing. I don't really know if I am not giving the correct memory options or in the places that I should, but this problem is getting me (more) bald !.
Any pointers would be very much appreciated.
Thanks for your support
Rafa
PS.., just in case this is the stacktrace of the error:
15/12/22 10:53:51 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 15/12/22 10:53:51 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:flume_config_e.conf 15/12/22 10:53:51 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1 15/12/22 10:53:51 INFO conf.FlumeConfiguration: Processing:k1 15/12/22 10:53:51 INFO conf.FlumeConfiguration: Processing:k1 15/12/22 10:53:51 INFO conf.FlumeConfiguration: Processing:k1 15/12/22 10:53:51 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1] 15/12/22 10:53:51 INFO node.AbstractConfigurationProvider: Creating channels 15/12/22 10:53:51 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory 15/12/22 10:53:51 INFO node.AbstractConfigurationProvider: Created channel c1 15/12/22 10:53:51 INFO source.DefaultSourceFactory: Creating instance of source r1, type exec 15/12/22 10:53:51 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: org.apache.flume.sink.solr.morphline.MorphlineSolrSink 15/12/22 10:53:51 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1] 15/12/22 10:53:51 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3927ce5e counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } 15/12/22 10:53:51 INFO node.Application: Starting Channel c1 15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean. 15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started 15/12/22 10:53:51 INFO node.Application: Starting Sink k1 15/12/22 10:53:51 INFO morphline.MorphlineSink: Starting Morphline Sink k1 (MorphlineSolrSink) ... 15/12/22 10:53:51 INFO node.Application: Starting Source r1 15/12/22 10:53:51 INFO source.ExecSource: Exec source starting with command:tail -f /var/logs/flume-ng/flume-cmf-flume-AGENT-sbmdeqpc01.ambientesbc.lab.log 15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean. 15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started 15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean. 15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 15/12/22 10:53:51 INFO source.ExecSource: Command [tail -f /var/logs/flume-ng/flume-cmf-flume-AGENT-sbmdeqpc01.ambientesbc.lab.log] exited with 1 15/12/22 10:53:51 INFO api.MorphlineContext: Importing commands 15/12/22 10:53:55 ERROR lifecycle.LifecycleSupervisor: Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3927ce5e counterGroup:{ name:null counters:{} } } - Exception follows. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.replace(String.java:2021) at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath.getClassName(ClassPath.java:403) at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$ClassInfo.<init>(ClassPath.java:193) at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$ResourceInfo.of(ClassPath.java:141) at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scanJar(ClassPath.java:345) at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scanFrom(ClassPath.java:286) at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scan(ClassPath.java:274) at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath.from(ClassPath.java:82) at org.kitesdk.morphline.api.MorphlineContext.getTopLevelClasses(MorphlineContext.java:149) at org.kitesdk.morphline.api.MorphlineContext.importCommandBuilders(MorphlineContext.java:91) at org.kitesdk.morphline.stdlib.Pipe.<init>(Pipe.java:43) at org.kitesdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:40) at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:126) at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:55) at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:101) at org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97) at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46) at org.apache.flume.SinkRunner.start(SinkRunner.java:79) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/12/22 10:53:55 INFO morphline.MorphlineSink: Morphline Sink k1 stopping... 15/12/22 10:53:55 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 stopped
Created on 12-22-2015 08:41 AM - edited 12-22-2015 09:23 AM
Are you using Cloudera Manager to start the flume agent? If so, you'll want to configure the heap size through Cloudera Manager.
If you are not using Cloudera Manager, you will want to specify the following on the command line, not as a -Dproperty value:
flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx=204800m
We recommend to set the -Xms and -Xmx values to the same amount so the jvm does not have to resize the heap which can cause performance issues.
Created on 12-22-2015 08:41 AM - edited 12-22-2015 09:23 AM
Are you using Cloudera Manager to start the flume agent? If so, you'll want to configure the heap size through Cloudera Manager.
If you are not using Cloudera Manager, you will want to specify the following on the command line, not as a -Dproperty value:
flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx=204800m
We recommend to set the -Xms and -Xmx values to the same amount so the jvm does not have to resize the heap which can cause performance issues.
Created 12-22-2015 01:29 PM
Hello pdvorak!!!
Thank you very much. i don't know why I was trying to use the memory parameters with a -Dproperty value. (brain freeze I guess)
using your suggestion was great and it worked perfectly!.
Thanks again!!, Kind Regards
rafa
Created 10-01-2018 10:08 AM
flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx204800m
There shouldn't be an equals sign after -Xmx