Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

kafka flume streaming gives outofmemory error

Highlighted

kafka flume streaming gives outofmemory error

I am facing issue with kafka 0.10 and flume 1.8. We are trying to ingest NRT data into hive by streaming. But often get errors like socket exception and followed by java.lang.OutOfMemoryError.

So i tried below things,

1. Exported java heap from flume-env.sh to 8GB but not worked.

2. Used multiple hive sinkers and channels (memory and file) but not worked.

We are having 4 node cluster with 16GB memory on each node. So is there any thing else I can do in flume configuration or any alternative for kafka-flume would help me.

Below is my flume properties

flume1.sources = kafka-source-1
flume1.channels = hive-channel-1 hive-channel-2 hive-channel-3 hive-channel-4 
hive-channel-5 hive-channel-6 hive-channel-7 hive-channel-8 hive-channel-9
flume1.sinks = hive-sink-1 hive-sink-2 hive-sink-3 hive-sink-4 hive-sink-5 
hive-sink-6 hive-sink-7 hive-sink-8 hive-sink-9
flume1.sources.kafka-source-1.type = 
org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.zookeeperConnect = base1.rolta.com:2181
flume1.sources.kafka-source-1.topic = iot-streaming
flume1.sources.kafka-source-1.batchSize = 100
flume1.sources.kafka-source-1.channels = hive-channel-1
flume1.sources.kafka-source-1.batchSize = 100
flume1.sources.kafka-source-1.batchDurationMillis = 10000

flume1.channels.hive-channel-1.type = file
flume1.channels.hive-channel-1.checkpointDir = /home/labuser/flumedata/chkpoint/001
flume1.channels.hive-channel-1.dataDirs = /home/labuser/flumedata/data/001
flume1.channels.hive-channel-1.transactionCapacity = 100

flume1.channels.hive-channel-2.type = file
flume1.channels.hive-channel-2.checkpointDir = /home/labuser/flumedata/chkpoint/002
flume1.channels.hive-channel-2.dataDirs = /home/labuser/flumedata/data/002
flume1.channels.hive-channel-2.transactionCapacity = 100

flume1.channels.hive-channel-3.type = file
flume1.channels.hive-channel-3.checkpointDir = /home/labuser/flumedata/chkpoint/003
flume1.channels.hive-channel-3.dataDirs = /home/labuser/flumedata/data/003
flume1.channels.hive-channel-3.transactionCapacity = 100

flume1.channels.hive-channel-4.type = file
flume1.channels.hive-channel-4.checkpointDir = /home/labuser/flumedata/chkpoint/004
flume1.channels.hive-channel-4.dataDirs = /home/labuser/flumedata/data/004
flume1.channels.hive-channel-4.transactionCapacity = 100

flume1.channels.hive-channel-5.type = file
flume1.channels.hive-channel-5.checkpointDir = /home/labuser/flumedata/chkpoint/005
flume1.channels.hive-channel-5.dataDirs = /home/labuser/flumedata/data/005
flume1.channels.hive-channel-5.transactionCapacity = 100

flume1.channels.hive-channel-6.type = file
flume1.channels.hive-channel-6.checkpointDir = /home/labuser/flumedata/chkpoint/006
flume1.channels.hive-channel-6.dataDirs = /home/labuser/flumedata/data/006
flume1.channels.hive-channel-6.transactionCapacity = 100

flume1.channels.hive-channel-7.type = file
flume1.channels.hive-channel-7.checkpointDir = /home/labuser/flumedata/chkpoint/007
flume1.channels.hive-channel-7.dataDirs = /home/labuser/flumedata/data/007
flume1.channels.hive-channel-7.transactionCapacity = 100

flume1.channels.hive-channel-8.type = file
flume1.channels.hive-channel-8.checkpointDir = /home/labuser/flumedata/chkpoint/008
flume1.channels.hive-channel-8.dataDirs = /home/labuser/flumedata/data/008
flume1.channels.hive-channel-8.transactionCapacity = 100

flume1.channels.hive-channel-9.type = file
flume1.channels.hive-channel-9.checkpointDir = /home/labuser/flumedata/chkpoint/009
flume1.channels.hive-channel-9.dataDirs = /home/labuser/flumedata/data/009
flume1.channels.hive-channel-9.transactionCapacity = 100

flume1.sinks.hive-sink-1.channel = hive-channel-1
flume1.sinks.hive-sink-1.type = hive
flume1.sinks.hive-sink-1.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-1.hive.database = default
flume1.sinks.hive-sink-1.hive.table = oneviewtest1
flume1.sinks.hive-sink-1.serializer = DELIMITED
flume1.sinks.hive-sink-1.serializer.delimiter = "\\|" 
flume1.sinks.hive-sink-1.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-1.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-1.batchSize = 100
flume1.sinks.hive-sink-1.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-1.hive.txnsPerBatchAsk = 2

flume1.sinks.hive-sink-2.channel = hive-channel-2
flume1.sinks.hive-sink-2.type = hive
flume1.sinks.hive-sink-2.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-2.hive.database = default
flume1.sinks.hive-sink-2.hive.table = oneviewtest1
flume1.sinks.hive-sink-2.serializer = DELIMITED
flume1.sinks.hive-sink-2.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-2.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-2.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-2.batchSize = 100
flume1.sinks.hive-sink-2.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-2.hive.txnsPerBatchAsk = 2

flume1.sinks.hive-sink-3.channel = hive-channel-3
flume1.sinks.hive-sink-3.type = hive
flume1.sinks.hive-sink-3.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-3.hive.database = default
flume1.sinks.hive-sink-3.hive.table = oneviewtest1
flume1.sinks.hive-sink-3.serializer = DELIMITED
flume1.sinks.hive-sink-3.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-3.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-3.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-3.batchSize = 100
flume1.sinks.hive-sink-3.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-3.hive.txnsPerBatchAsk = 2

flume1.sinks.hive-sink-4.channel = hive-channel-4
flume1.sinks.hive-sink-4.type = hive
flume1.sinks.hive-sink-4.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-4.hive.database = default
flume1.sinks.hive-sink-4.hive.table = oneviewtest1
flume1.sinks.hive-sink-4.serializer = DELIMITED
flume1.sinks.hive-sink-4.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-4.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-4.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-4.batchSize = 100
flume1.sinks.hive-sink-4.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-4.hive.txnsPerBatchAsk = 2

flume1.sinks.hive-sink-5.channel = hive-channel-5
flume1.sinks.hive-sink-5.type = hive
flume1.sinks.hive-sink-5.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-5.hive.database = default
flume1.sinks.hive-sink-5.hive.table = oneviewtest1
flume1.sinks.hive-sink-5.serializer = DELIMITED
flume1.sinks.hive-sink-5.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-5.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-5.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-5.batchSize = 100
flume1.sinks.hive-sink-5.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-5.hive.txnsPerBatchAsk = 2

flume1.sinks.hive-sink-6.channel = hive-channel-6
flume1.sinks.hive-sink-6.type = hive
flume1.sinks.hive-sink-6.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-6.hive.database = default
flume1.sinks.hive-sink-6.hive.table = oneviewtest1
flume1.sinks.hive-sink-6.serializer = DELIMITED
flume1.sinks.hive-sink-6.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-6.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-6.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-6.batchSize = 100
flume1.sinks.hive-sink-6.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-6.hive.txnsPerBatchAsk = 2

flume1.sinks.hive-sink-7.channel = hive-channel-7
flume1.sinks.hive-sink-7.type = hive
flume1.sinks.hive-sink-7.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-7.hive.database = default
flume1.sinks.hive-sink-7.hive.table = oneviewtest1
flume1.sinks.hive-sink-7.serializer = DELIMITED
flume1.sinks.hive-sink-7.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-7.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-7.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-7.batchSize = 100
flume1.sinks.hive-sink-7.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-7.hive.txnsPerBatchAsk = 2

flume1.sinks.hive-sink-8.channel = hive-channel-8
flume1.sinks.hive-sink-8.type = hive
flume1.sinks.hive-sink-8.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-8.hive.database = default
flume1.sinks.hive-sink-8.hive.table = oneviewtest1
flume1.sinks.hive-sink-8.serializer = DELIMITED
flume1.sinks.hive-sink-8.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-8.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-8.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-8.batchSize = 100
flume1.sinks.hive-sink-8.hive.partition = %y-%m-%d
flume1.sinks.hive-sink-8.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-9.channel = hive-channel-9
flume1.sinks.hive-sink-9.type = hive
flume1.sinks.hive-sink-9.hive.metastore = thrift://base1.rolta.com:9083
flume1.sinks.hive-sink-9.hive.database = default
flume1.sinks.hive-sink-9.hive.table = oneviewtest1
flume1.sinks.hive-sink-9.serializer = DELIMITED
flume1.sinks.hive-sink-9.serializer.delimiter = "\\|"
flume1.sinks.hive-sink-9.serializer.fieldnames = db_rqst_id,opc_tag_id,opc_qlty_val,opc_tag_val,opc_tag_val_ts,opc_src_stm_code,tag_data_src
flume1.sinks.hive-sink-9.hive.txnsPerBatchAsk = 2
flume1.sinks.hive-sink-9.batchSize = 100
flume1.sinks.hive-sink-9.hive.partition = %y-%m-%d
1 REPLY 1

Re: kafka flume streaming gives outofmemory error

Rising Star

What's you data flow rate? Could you please post the stack trace?

Note that you should try to tune your batch sizes; size of event * batch size > heap will result in OOM.


On a side note: having large single event can cause such errors.

Don't have an account?
Coming from Hortonworks? Activate your account here