Reply
New Contributor
Posts: 5
Registered: ‎08-08-2017

gzip compression for hdfs sink gives Not an Avro data file

[ Edited ]

Hello,

 

below is my configuration which works perfectly fine for non compressed data

 

agent.sinks.test.type = hdfs
agent.sinks.test.hdfs.useLocalTimeStamp = true
agent.sinks.test.hdfs.path = s3n://AccessKeys@test/%{topic}/utc=%s
agent.sinks.test.hdfs.roundUnit = minute
agent.sinks.test.hdfs.round = true
agent.sinks.test.hdfs.roundValue = 10
agent.sinks.test.hdfs.fileSuffix = .avro
agent.sinks.test.serializer = com.test.flume.sink.serializer.GenericRecordAvroEventSerializer$Builder
agent.sinks.test.hdfs.fileType = DataStream
agent.sinks.test.hdfs.maxOpenFiles=100
agent.sinks.test.hdfs.appendTimeout = 5000
agent.sinks.test.hdfs.callTimeout = 4000
agent.sinks.test.hdfs.rollInterval = 60
agent.sinks.test.hdfs.rollSize = 0 
agent.sinks.test.hdfs.rollCount = 1000
agent.sinks.test.hdfs.batchSize = 1000
agent.sinks.test.hdfs.threadsPoolSize=100

 

I am trying to add compression using gzip to this as follows

 

agent.sinks.test.type = hdfs
agent.sinks.test.hdfs.useLocalTimeStamp = true
agent.sinks.test.hdfs.path = s3n://AccessKeys@test/%{topic}/utc=%s
agent.sinks.test.hdfs.roundUnit = minute
agent.sinks.test.hdfs.round = true
agent.sinks.test.hdfs.roundValue = 10
agent.sinks.test.hdfs.fileSuffix = .avro
agent.sinks.test.serializer = com.test.flume.sink.serializer.GenericRecordAvroEventSerializer$Builder
agent.sinks.test.hdfs.fileType = CompressedStream
agent.sinks.test.hdfs.codeC = gzip
agent.sinks.test.hdfs.maxOpenFiles=100
agent.sinks.test.hdfs.appendTimeout = 10000
agent.sinks.test.hdfs.callTimeout = 4000
agent.sinks.test.hdfs.rollInterval = 60
agent.sinks.test.hdfs.rollSize = 0
agent.sinks.test.hdfs.rollCount = 1000
agent.sinks.test.hdfs.batchSize = 1000
agent.sinks.test.hdfs.threadsPoolSize=100

 

All the above data is being stored in s3 and when i try to retrieve the data from hive i am getting the below weerror.

Exception in thread "main" java.io.IOException: Not an Avro data file

 

Can you please let me know why is my configuration not working ?

Announcements