- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Flume adding line feed after 2048 characters in a row
- Labels:
-
Apache Flume
-
Apache Hadoop
-
HDFS
Created on 03-31-2016 11:58 AM - edited 09-16-2022 03:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a Flume 1.5 agent running on a Ubuntu workstation that collects logs from various devices and re-formats the logs into a comma delimited file with very long rows. After the collection and re-reformatting of the logs they are placed into a spool directory where the Flume Agent sends the log file to a Hadoop server running a Flume agent to accept the log file and place them in a HDFS directory.
Everything works fine except that when Flume sends the file to HDFS directory there are Line Feeds after every 2048 characters in each row.
Is there a setting to tell flume to not insert line feeds?
Below are my config files:
#On Ubuntu Workstation #list sources, sinks and channels in the agent agent.sources = axon_source agent.channels = memorychannel agent.sinks = AvroOut #define flow agent.sources.axon_source.channels = memorychannel agent.sinks.AvroOut.channel = memorychannel agent.channels.memorychannel.type = memory agent.channels.memorychannel.capacity = 100000 #source agent.sources.axon_source.type = spooldir agent.sources.axon_source.spoolDir = /home/ubuntu/workspace/logdump agent.sources.axon_source.decodeErrorPolicy = ignore #avro out agent.sinks.AvroOut.type = avro agent.sinks.AvroOut.hostname = 172.31.12.221 agent.sinks.AvroOut.port = 41415 agent.sinks.AvroOut.maxIoWorkers = 2 ------------------------------------------------------------ #On Hadoop Server agent.sources = AvroIn agent.sources.AvroIn.type = avro agent.sources.AvroIn.bind = 172.31.131.1 agent.sources.AvroIn.port = 41415 agent.sources.AvroIn.channels = MemChan1 agent.channels = MemChan1 agent.channels.MemChan1.type = memory agent.channels.MemChan1.capacity = 100000 agent.sinks = HDFSSink agent.sinks.HDFSSink.type = hdfs agent.sinks.HDFSSink.channel = MemChan1 agent.sinks.HDFSSink.hdfs.path = /Logs/%Y%m/ agent.sinks.HDFSSink.hdfs.filePrefix = axoncapture agent.sinks.HDFSSink.hdfs.fileSuffix = .log agent.sinks.HDFSSink.hdfs.minBlockReplicas = 1 agent.sinks.HDFSSink.hdfs.rollCount = 0 agent.sinks.HDFSSink.hdfs.rollSize = 314572800 agent.sinks.HDFSSink.hdfs.writeFormat = Text agent.sinks.HDFSSink.hdfs.fileType = DataStream agent.sinks.HDFSSink.hdfs.useLocalTimeStamp = True
Thanks...Corey
Created 03-31-2016 12:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default maxLineLength for the LINE deserializer is 2048:
http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#line
You can set the following to accomodate your large events:
agent.sources.axon_source.deserializer.maxLineLength=10000
Created 03-31-2016 12:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default maxLineLength for the LINE deserializer is 2048:
http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#line
You can set the following to accomodate your large events:
agent.sources.axon_source.deserializer.maxLineLength=10000