- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Only 10 records populating from local to hdfs while running flume but i have 500 records in my file
- Labels:
-
Apache Flume
-
Apache Hadoop
-
HDFS
Created on ‎03-31-2016 03:22 AM - edited ‎09-16-2022 03:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here my config file
-----Local Config
agent.sources = localsource
agent.channels = memoryChannel
agent.sinks = avro_Sink
agent.sources.localsource.type = exec
agent.sources.localsource.shell = /bin/bash -c
agent.sources.localsource.command = tail -F /home/dwh/teja/Flumedata/testfile.csv
# The channel can be defined as follows.
agent.sources.localsource.channels = memoryChannel
# Each sink's type must be defined
agent.sinks.avro_Sink.type = avro
agent.sinks.avro_Sink.hostname=192.168.44.4
agent.sinks.avro_Sink.port= 8021
agent.sinks.avro_Sink.avro.batchSize = 10000
agent.sinks.avro_Sink.avro.rollCount = 5000
agent.sinks.avro_Sink.avro.rollSize = 500
agent.sinks.avro_Sink.avro.rollInterval = 30
agent.sinks.avro_Sink.channel = memoryChannel
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 10000
agent.channels.memoryChannel.transactionCapacity = 10000
------Remote config
# Please paste flume.conf here. Example:
# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
# For each source, channel, and sink, set
tier1.sources.source1.type = avro
tier1.sources.source1.bind = 192.168.44.4
tier1.sources.source1.port=8021
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type = memory
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.channel = channel1
tier1.sinks.sink1.hdfs.path = hdfs://192.168.44.4:8020/user/hadoop/flumelogs/
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.writeFormat= Text
tier1.sinks.sink1.hdfs.batchSize = 10000
tier1.sinks.sink1.hdfs.rollCount = 5000
tier1.sinks.sink1.hdfs.rollSize = 500
tier1.sinks.sink1.hdfs.rollInterval = 30
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactioncapacity=10000
Please help i want to populate full file from local to hdfs
Created ‎03-31-2016 12:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-PD
Created ‎03-31-2016 09:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you do want to stream them then, the spooldir source would be used if the files are not being appended to. If they are being appended to while flume is reading them, then you would want to use the new taildir source (as of CDH5.5) [1], as it provides a more reliable handling of streaming log files. The spool dir source requires that files are not modified once they are in the spool directory, and they are removed or marked with .COMPLETED when ingestion is finished.
-PD
[1] http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source
Created ‎03-31-2016 12:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-PD
Created ‎03-31-2016 09:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎03-31-2016 09:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you do want to stream them then, the spooldir source would be used if the files are not being appended to. If they are being appended to while flume is reading them, then you would want to use the new taildir source (as of CDH5.5) [1], as it provides a more reliable handling of streaming log files. The spool dir source requires that files are not modified once they are in the spool directory, and they are removed or marked with .COMPLETED when ingestion is finished.
-PD
[1] http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source
Created ‎03-31-2016 10:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created on ‎04-01-2016 02:43 AM - edited ‎04-01-2016 02:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi if i gave spooldir is this config fine
# For each one of the sources, the type is defined
agent.sources.localsource.type = spooldir
#agent.sources.localsource.shell = /bin/bash -c
agent.sources.localsource.command = /home/dwh/teja/Flumedata/
agent.sources.localsource.fileHeader = true
or else i want to add file name as well in path
Created ‎04-03-2016 10:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
my config file is same as above which i shared with spooldir source
