Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎03-31-2016
Accepted Solution

Flume adding line feed after 2048 characters in a row

I have a Flume 1.5 agent running on a Ubuntu workstation that collects logs from various devices and re-formats the logs into a comma delimited file with very long rows. After the collection and re-reformatting of the logs they are placed into a spool directory where the Flume Agent sends the log file to a Hadoop server running a Flume agent to accept the log file and place them in a HDFS directory.

 

Everything works fine except that when Flume sends the file to HDFS directory there are Line Feeds after every 2048 characters in each row.

 

Is there a setting to tell flume to not insert line feeds?

Below are my config files:

#On Ubuntu Workstation
#list sources, sinks and channels in the agent
agent.sources = axon_source
agent.channels = memorychannel
agent.sinks = AvroOut

#define flow
agent.sources.axon_source.channels = memorychannel
agent.sinks.AvroOut.channel = memorychannel
agent.channels.memorychannel.type = memory
agent.channels.memorychannel.capacity = 100000

#source
agent.sources.axon_source.type = spooldir
agent.sources.axon_source.spoolDir = /home/ubuntu/workspace/logdump
agent.sources.axon_source.decodeErrorPolicy = ignore

#avro out
agent.sinks.AvroOut.type = avro
agent.sinks.AvroOut.hostname = 172.31.12.221
agent.sinks.AvroOut.port = 41415
agent.sinks.AvroOut.maxIoWorkers = 2


------------------------------------------------------------


#On Hadoop Server
agent.sources = AvroIn
agent.sources.AvroIn.type = avro
agent.sources.AvroIn.bind = 172.31.131.1
agent.sources.AvroIn.port = 41415
agent.sources.AvroIn.channels = MemChan1

agent.channels = MemChan1
agent.channels.MemChan1.type = memory
agent.channels.MemChan1.capacity = 100000

agent.sinks = HDFSSink
agent.sinks.HDFSSink.type = hdfs
agent.sinks.HDFSSink.channel = MemChan1
agent.sinks.HDFSSink.hdfs.path = /Logs/%Y%m/
agent.sinks.HDFSSink.hdfs.filePrefix = axoncapture
agent.sinks.HDFSSink.hdfs.fileSuffix = .log
agent.sinks.HDFSSink.hdfs.minBlockReplicas = 1
agent.sinks.HDFSSink.hdfs.rollCount = 0
agent.sinks.HDFSSink.hdfs.rollSize = 314572800
agent.sinks.HDFSSink.hdfs.writeFormat = Text
agent.sinks.HDFSSink.hdfs.fileType = DataStream
agent.sinks.HDFSSink.hdfs.useLocalTimeStamp = True

 Thanks...Corey

Cloudera Employee
Posts: 277
Registered: ‎01-09-2014

Re: Flume adding line feed after 2048 characters in a row

The default maxLineLength for the LINE deserializer is 2048:
http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#line

You can set the following to accomodate your large events:
agent.sources.axon_source.deserializer.maxLineLength=10000

Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.