Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Flume Kafka HDFS Sink Empty Lines

Highlighted

Flume Kafka HDFS Sink Empty Lines

New Contributor

Hi,

I am using Flume to sink data from Kafka topic to HDFS with channel as KafkaChannel. At first, Flume was inserting an empty line after each record in the HDFS file, and when I query the Hive table pointing to this directory, I am seeing null line after each record. I have added the appendNewLine=false to my Flume config file, which eliminated the empty line between records. But for every HDFS file flume creates, the first line is always empty, and the hive query also shows first line as Null followed by records. Do I have to add any property to my Flume config file which eliminate the null lines?. Please Suggest. Below is my Flume Config.

 

ftest.channels = ctest
ftest.sinks = stest

ftest.channels.ctest.type = org.apache.flume.channel.kafka.KafkaChannel
ftest.channels.ctest.brokerList = broker1-host:9092,broker2-host:9092,broker3-host:9092
ftest.channels.ctest.topic = ftest_pb
ftest.channels.ctest.groupId = ftest_pb_flume
ftest.channels.ctest.zookeeperConnect = host1:2181,host2:2181,host3:2181
ftest.channels.ctest.parseAsFlumeEvent = false
ftest.channels.ctest.kafka.consumer.session.timeout.ms=120000
ftest.channels.ctest.kafka.consumer.request.timeout.ms=120002
ftest.channels.ctest.kafka.consumer.linger.ms=5000

ftest.sinks.stest.type=hdfs
ftest.sinks.stest.hdfs.path=/data/incoming/ftest_pb
ftest.sinks.stest.hdfs.filePrefix=ft
ftest.sinks.stest.hdfs.useLocalTimeStamp = true
ftest.sinks.stest.hdfs.rollSize=1024000000
ftest.sinks.stest.hdfs.batchSize=10000
ftest.sinks.stest.hdfs.rollCount=0
ftest.sinks.stest.hdfs.minBlockReplicas=1
ftest.sinks.stest.hdfs.txnEventMax=10000
ftest.sinks.stest.hdfs.callTimeout=1000000
ftest.sinks.stest.channel=ctest
ftest.sinks.stest.serializer=text
ftest.sinks.stest.serializer.appendNewline=false
ftest.sinks.stest.hdfs.kerberosPrincipal = $KERBEROS_PRINCIPAL
ftest.sinks.stest.hdfs.kerberosKeytab = $KERBEROS_KEYTAB
ftest.sinks.stest.hdfs.fileType=DataStream