Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to ingest syslog data from remote server to hadoop cluster.

Highlighted

Unable to ingest syslog data from remote server to hadoop cluster.

Explorer

I am trying to load the syslog data from remote server where flume is running on docker . After establishing a connection , flume agent log is showing connection port conitinously changing.

 

18/10/29 12:08:50 INFO ipc.NettyServer: [id: 0x379bec2f, /1**.**.***.103:44240 => /1**.**.**.130:4555] OPEN
18/10/29 12:08:50 INFO ipc.NettyServer: [id: 0x379bec2f, /1**.**.***.103:44240 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:08:50 INFO ipc.NettyServer: [id: 0x379bec2f, /1**.**.***.103:44240 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44240
18/10/29 12:09:15 INFO ipc.NettyServer: [id: 0xbf79cfcd, /1**.**.***.103:44264 => /1**.**.**.130:4555] OPEN
18/10/29 12:09:15 INFO ipc.NettyServer: [id: 0xbf79cfcd, /1**.**.***.103:44264 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:09:15 INFO ipc.NettyServer: [id: 0xbf79cfcd, /1**.**.***.103:44264 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44264
18/10/29 12:09:40 INFO ipc.NettyServer: [id: 0xe6844b8d, /1**.**.***.103:44276 => /1**.**.**.130:4555] OPEN
18/10/29 12:09:40 INFO ipc.NettyServer: [id: 0xe6844b8d, /1**.**.***.103:44276 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:09:40 INFO ipc.NettyServer: [id: 0xe6844b8d, /1**.**.***.103:44276 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44276
18/10/29 12:10:05 INFO ipc.NettyServer: [id: 0x7350daee, /1**.**.***.103:44288 => /1**.**.**.130:4555] OPEN
18/10/29 12:10:05 INFO ipc.NettyServer: [id: 0x7350daee, /1**.**.***.103:44288 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:10:05 INFO ipc.NettyServer: [id: 0x7350daee, /1**.**.***.103:44288 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44288
18/10/29 12:10:30 INFO ipc.NettyServer: [id: 0x74f138ce, /1**.**.***.103:44300 => /1**.**.**.130:4555] OPEN
18/10/29 12:10:30 INFO ipc.NettyServer: [id: 0x74f138ce, /1**.**.***.103:44300 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:10:30 INFO ipc.NettyServer: [id: 0x74f138ce, /1**.**.***.103:44300 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44300

 At the remote server the logg shows

SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.OutOfMemoryError: GC overhead limit exceeded

 Due to this I tried to decrease the transaction capacity and capacity of the channel.But still getting the same error.

Flume config at remote server :-

# Naming the components of the current agent.
WsAccLogTail.sources = AccessLog
WsAccLogTail.sinks = AvroSink
WsAccLogTail.channels = MemChannel

# Source Configuration
WsAccLogTail.sources.AccessLog.type = org.apache.flume.source.taildir.TaildirSource
#WsAccLogTail.sources.AccessLog.type = exec
WsAccLogTail.sources.AccessLog.positionFile  = /tmp/flume/taildir_position.json
WsAccLogTail.sources.AccessLog.filegroups = acLog
WsAccLogTail.sources.AccessLog.filegroups.acLog = /tmp/access_server.log
WsAccLogTail.sources.AccessLog.filegroups.acLog.headerKey1 = value1
#WsAccLogTail.sources.AccessLog.batchSize = 1000
#WsAccLogTail.sources.AccessLog.interceptors = itime

# Timestamp Interceptor
#WsAccLogTail.sources.AccessLog.interceptors.itime.type = timestamp

# Sink Configuration (Send to Flume Collector Agent on Hadoop Edge Node)
WsAccLogTail.sinks.AvroSink.type = avro
WsAccLogTail.sinks.AvroSink.hostname = 1**.**.***.130
WsAccLogTail.sinks.AvroSink.port = 4555
WsAccLogTail.channels.MemChannel.transactionCapacity =5000
WsAccLogTail.channels.MemChannel.capacity  = 20000


# Channel Configuration
WsAccLogTail.channels.MemChannel.type = file
WsAccLogTail.channels.MemChannel.maxFileSize=52428800

# Bind Source & Sink to the Channel
WsAccLogTail.sources.AccessLog.channels = MemChannel
WsAccLogTail.sinks.AvroSink.channel = MemChannel

Flume config at Hadoop  server :-

 

# EdgeAccLogAgent

# Naming the components of the current agent.
EdgeAccLog1.sources = AvroSource
EdgeAccLog1.sinks = file1
EdgeAccLog1.channels = MemChannel

# Source Configuration
EdgeAccLog1.sources.AvroSource.type = avro
EdgeAccLog1.sources.AvroSource.bind = 0.0.0.0
EdgeAccLog1.sources.AvroSource.port = 4555

EdgeAccLog1.sinks.file1.type=org.apache.flume.sink.kafka.KafkaSink
EdgeAccLog1.sinks.file1.topic=international-syslog
EdgeAccLog1.sinks.file1.brokerList = 1**.**.***.***:9092,1**.**.***.***:9092
#avoid data loss use requiredAcks -1
EdgeAccLog1.sinks.file1.requiredAcks = -1
EdgeAccLog1.sinks.file1.batchSize =500


#save the memory at the host --> file (prefered)
EdgeAccLog1.channels.MemChannel.type = memory
EdgeAccLog1.channels.MemChannel.capacity =5000
EdgeAccLog1.channels.MemChannel.transactionCapacity = 2000
#EdgeAccLog1.channels.MemChannel.maxFileSize=214643501

# Bind Source & Sink to the Channel
EdgeAccLog1.sources.AvroSource.channels = MemChannel
EdgeAccLog1.sinks.file1.channel = MemChannel