Reply
Explorer
Posts: 17
Registered: ‎11-27-2017

Unable to ingest syslog data from remote server to hadoop cluster.

I am trying to load the syslog data from remote server where flume is running on docker . After establishing a connection , flume agent log is showing connection port conitinously changing.

 

18/10/29 12:08:50 INFO ipc.NettyServer: [id: 0x379bec2f, /1**.**.***.103:44240 => /1**.**.**.130:4555] OPEN
18/10/29 12:08:50 INFO ipc.NettyServer: [id: 0x379bec2f, /1**.**.***.103:44240 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:08:50 INFO ipc.NettyServer: [id: 0x379bec2f, /1**.**.***.103:44240 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44240
18/10/29 12:09:15 INFO ipc.NettyServer: [id: 0xbf79cfcd, /1**.**.***.103:44264 => /1**.**.**.130:4555] OPEN
18/10/29 12:09:15 INFO ipc.NettyServer: [id: 0xbf79cfcd, /1**.**.***.103:44264 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:09:15 INFO ipc.NettyServer: [id: 0xbf79cfcd, /1**.**.***.103:44264 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44264
18/10/29 12:09:40 INFO ipc.NettyServer: [id: 0xe6844b8d, /1**.**.***.103:44276 => /1**.**.**.130:4555] OPEN
18/10/29 12:09:40 INFO ipc.NettyServer: [id: 0xe6844b8d, /1**.**.***.103:44276 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:09:40 INFO ipc.NettyServer: [id: 0xe6844b8d, /1**.**.***.103:44276 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44276
18/10/29 12:10:05 INFO ipc.NettyServer: [id: 0x7350daee, /1**.**.***.103:44288 => /1**.**.**.130:4555] OPEN
18/10/29 12:10:05 INFO ipc.NettyServer: [id: 0x7350daee, /1**.**.***.103:44288 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:10:05 INFO ipc.NettyServer: [id: 0x7350daee, /1**.**.***.103:44288 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44288
18/10/29 12:10:30 INFO ipc.NettyServer: [id: 0x74f138ce, /1**.**.***.103:44300 => /1**.**.**.130:4555] OPEN
18/10/29 12:10:30 INFO ipc.NettyServer: [id: 0x74f138ce, /1**.**.***.103:44300 => /1**.**.**.130:4555] BOUND: /1**.**.**.130:4555
18/10/29 12:10:30 INFO ipc.NettyServer: [id: 0x74f138ce, /1**.**.***.103:44300 => /1**.**.**.130:4555] CONNECTED: /1**.**.***.103:44300

 At the remote server the logg shows

SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.OutOfMemoryError: GC overhead limit exceeded

 Due to this I tried to decrease the transaction capacity and capacity of the channel.But still getting the same error.

Flume config at remote server :-

# Naming the components of the current agent.
WsAccLogTail.sources = AccessLog
WsAccLogTail.sinks = AvroSink
WsAccLogTail.channels = MemChannel

# Source Configuration
WsAccLogTail.sources.AccessLog.type = org.apache.flume.source.taildir.TaildirSource
#WsAccLogTail.sources.AccessLog.type = exec
WsAccLogTail.sources.AccessLog.positionFile  = /tmp/flume/taildir_position.json
WsAccLogTail.sources.AccessLog.filegroups = acLog
WsAccLogTail.sources.AccessLog.filegroups.acLog = /tmp/access_server.log
WsAccLogTail.sources.AccessLog.filegroups.acLog.headerKey1 = value1
#WsAccLogTail.sources.AccessLog.batchSize = 1000
#WsAccLogTail.sources.AccessLog.interceptors = itime

# Timestamp Interceptor
#WsAccLogTail.sources.AccessLog.interceptors.itime.type = timestamp

# Sink Configuration (Send to Flume Collector Agent on Hadoop Edge Node)
WsAccLogTail.sinks.AvroSink.type = avro
WsAccLogTail.sinks.AvroSink.hostname = 1**.**.***.130
WsAccLogTail.sinks.AvroSink.port = 4555
WsAccLogTail.channels.MemChannel.transactionCapacity =5000
WsAccLogTail.channels.MemChannel.capacity  = 20000


# Channel Configuration
WsAccLogTail.channels.MemChannel.type = file
WsAccLogTail.channels.MemChannel.maxFileSize=52428800

# Bind Source & Sink to the Channel
WsAccLogTail.sources.AccessLog.channels = MemChannel
WsAccLogTail.sinks.AvroSink.channel = MemChannel

Flume config at Hadoop  server :-

 

# EdgeAccLogAgent

# Naming the components of the current agent.
EdgeAccLog1.sources = AvroSource
EdgeAccLog1.sinks = file1
EdgeAccLog1.channels = MemChannel

# Source Configuration
EdgeAccLog1.sources.AvroSource.type = avro
EdgeAccLog1.sources.AvroSource.bind = 0.0.0.0
EdgeAccLog1.sources.AvroSource.port = 4555

EdgeAccLog1.sinks.file1.type=org.apache.flume.sink.kafka.KafkaSink
EdgeAccLog1.sinks.file1.topic=international-syslog
EdgeAccLog1.sinks.file1.brokerList = 1**.**.***.***:9092,1**.**.***.***:9092
#avoid data loss use requiredAcks -1
EdgeAccLog1.sinks.file1.requiredAcks = -1
EdgeAccLog1.sinks.file1.batchSize =500


#save the memory at the host --> file (prefered)
EdgeAccLog1.channels.MemChannel.type = memory
EdgeAccLog1.channels.MemChannel.capacity =5000
EdgeAccLog1.channels.MemChannel.transactionCapacity = 2000
#EdgeAccLog1.channels.MemChannel.maxFileSize=214643501

# Bind Source & Sink to the Channel
EdgeAccLog1.sources.AvroSource.channels = MemChannel
EdgeAccLog1.sinks.file1.channel = MemChannel

 

 

Announcements
New solutions