Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

MorphlineSolrSink GC overhead limit Exceeded in Flume Sink

avatar
Explorer

Good Morning everyone

 

I've been trying to get the syslog->Solr example specified here .  It is a fairly simple example using:

 

syslog source

memory channel

MorphlineSolr Sink

 

The flume configuration file I used is:

 

a1.sources = r1
a1.channels = c1
a1.sinks = k1

#source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = xxxx
a1.sources.r1.host = xxx.xxx.xxx.xxx

#sink
a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.morphlineFile = /route/to/the/morphline.conf

#channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
#a1.channels.c1.transactionCapacity = 10000

#connect
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 

The morphline files I used was the same provided in the example above:

 

morphlines : [
  {
    id : morphline1

    importCommands : ["com.cloudera.**", "org.kitesdk.**", "org.apache.solr.**"]

    commands : [
      {
        readLine {
          charset : UTF-8
        }
      }

      {
        grok {
          dictionaryFiles : [/route/to/the/morphline/grok-dictonaries]
          expressions : {
            message : """<%{POSINT:priority}>%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA&colon;program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA&colon;msg}"""
          }
        }
      }


      {
        convertTimestamp {
          field : timestamp
          inputFormats : [ "yyyy-MM-dd'T'HH:mm:ss'Z'", "MMM d HH:mm:ss" ]
          inputTimezone : America/Bogota
          outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
          outputTimezone : UTC
        }
      }

      {
        sanitizeUnknownSolrFields {
          # Location from which to fetch Solr schema
          solrLocator: {
            collection: syslogs
            zkHost: "zkEnsembleAddresses"
          }
        }
      }

      # log the record at INFO level to SLF4J
      { logInfo { format : "output record: {}", args : ["@{}"] } }

      {
        loadSolr {
          solrLocator : {
            collection: syslogs
            zkHost: "zkEnsembleAddresses"
          }
        }
      }
    ]
  }
]

The morphlines file specify the folowing chain of commands: 

 

- read the line

- grok messages with an expression

- convert the timestamp

- filter unsanitized fields

- put the data in solr

 

However when I try to start the flume agent it always throws the error:

 

java.lang.OutOfMemoryError: GC overhead limit exceeded

 

it always shows the same error after the message:

 

INFO api.MorphlineContext: Importing commands

 

flume never gets to start the solr sink.   It seems that there is not enough memory in flume to start the sink.  So I modified the /etc/flume-ng/conf/flume-env.sh file and uncommented the JAVA_OPTS LINE.  the uncommented line was this:

 

export JAVA_OPTS="-Xms2048m -Xmx204800m -Dcom.sun.management.jmxremote"

Basically I was giving 2GB of starting heap space to java (a maximum limit of 200GB - the machines in the cluster have a lot of Memory).  The error is still the same :(.

 

Then I modified the command line to start the flume agent trying to increase the java memory:

 

flume-ng agent -n a1 -f flume_config.conf -Dproperty="-Xms1024m -Xmx=204800m"

And the error still keeps appearing.  I don't really know if I am not giving the correct memory options or in the places that I should, but this problem is getting me (more) bald !.

 

Any pointers would be very much appreciated.

 

Thanks for your support

 

Rafa

 

PS.., just in case this is the stacktrace of the error:

15/12/22 10:53:51 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
15/12/22 10:53:51 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:flume_config_e.conf
15/12/22 10:53:51 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
15/12/22 10:53:51 INFO conf.FlumeConfiguration: Processing:k1
15/12/22 10:53:51 INFO conf.FlumeConfiguration: Processing:k1
15/12/22 10:53:51 INFO conf.FlumeConfiguration: Processing:k1
15/12/22 10:53:51 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
15/12/22 10:53:51 INFO node.AbstractConfigurationProvider: Creating channels
15/12/22 10:53:51 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
15/12/22 10:53:51 INFO node.AbstractConfigurationProvider: Created channel c1
15/12/22 10:53:51 INFO source.DefaultSourceFactory: Creating instance of source r1, type exec
15/12/22 10:53:51 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: org.apache.flume.sink.solr.morphline.MorphlineSolrSink
15/12/22 10:53:51 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1]
15/12/22 10:53:51 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3927ce5e counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
15/12/22 10:53:51 INFO node.Application: Starting Channel c1
15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
15/12/22 10:53:51 INFO node.Application: Starting Sink k1
15/12/22 10:53:51 INFO morphline.MorphlineSink: Starting Morphline Sink k1 (MorphlineSolrSink) ...
15/12/22 10:53:51 INFO node.Application: Starting Source r1
15/12/22 10:53:51 INFO source.ExecSource: Exec source starting with command:tail -f /var/logs/flume-ng/flume-cmf-flume-AGENT-sbmdeqpc01.ambientesbc.lab.log
15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
15/12/22 10:53:51 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
15/12/22 10:53:51 INFO source.ExecSource: Command [tail -f /var/logs/flume-ng/flume-cmf-flume-AGENT-sbmdeqpc01.ambientesbc.lab.log] exited with 1
15/12/22 10:53:51 INFO api.MorphlineContext: Importing commands
15/12/22 10:53:55 ERROR lifecycle.LifecycleSupervisor: Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3927ce5e counterGroup:{ name:null counters:{} } } - Exception follows.
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.String.replace(String.java:2021)
        at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath.getClassName(ClassPath.java:403)
        at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$ClassInfo.<init>(ClassPath.java:193)
        at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$ResourceInfo.of(ClassPath.java:141)
        at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scanJar(ClassPath.java:345)
        at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scanFrom(ClassPath.java:286)
        at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scan(ClassPath.java:274)
        at org.kitesdk.morphline.shaded.com.google.common.reflect.ClassPath.from(ClassPath.java:82)
        at org.kitesdk.morphline.api.MorphlineContext.getTopLevelClasses(MorphlineContext.java:149)
        at org.kitesdk.morphline.api.MorphlineContext.importCommandBuilders(MorphlineContext.java:91)
        at org.kitesdk.morphline.stdlib.Pipe.<init>(Pipe.java:43)
        at org.kitesdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:40)
        at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:126)
        at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:55)
        at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:101)
        at org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97)
        at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
        at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
15/12/22 10:53:55 INFO morphline.MorphlineSink: Morphline Sink k1 stopping...
15/12/22 10:53:55 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 stopped

 

1 ACCEPTED SOLUTION

avatar

Are you using Cloudera Manager to start the flume agent?  If so, you'll want to configure the heap size through Cloudera Manager.  

 

If you are not using Cloudera Manager, you will want to specify the following on the command line, not as a -Dproperty value:

flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx=204800m

 

We recommend to set the -Xms and -Xmx values to the same amount so the jvm does not have to resize the heap which can cause performance issues.

View solution in original post

3 REPLIES 3

avatar

Are you using Cloudera Manager to start the flume agent?  If so, you'll want to configure the heap size through Cloudera Manager.  

 

If you are not using Cloudera Manager, you will want to specify the following on the command line, not as a -Dproperty value:

flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx=204800m

 

We recommend to set the -Xms and -Xmx values to the same amount so the jvm does not have to resize the heap which can cause performance issues.

avatar
Explorer

 

Hello pdvorak!!!

 

Thank you very much.  i don't know why I was trying to use the memory parameters with a -Dproperty value.  (brain freeze I guess)

 

using your suggestion was great and it worked perfectly!.  

 

Thanks again!!, Kind Regards

 

rafa

avatar
flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx204800m

There shouldn't be an equals sign after -Xmx