<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Only 10 records populating from local to hdfs  while running flume but i have 500 records in my file in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39202#M24101</link>
    <description>&lt;P&gt;Here my config file&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-----Local Config&lt;/P&gt;&lt;P&gt;agent.sources = localsource&lt;BR /&gt;agent.channels = memoryChannel&lt;BR /&gt;agent.sinks = avro_Sink&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;agent.sources.localsource.type = exec&lt;BR /&gt;agent.sources.localsource.shell = /bin/bash -c&lt;BR /&gt;agent.sources.localsource.command = tail -F /home/dwh/teja/Flumedata/testfile.csv&lt;/P&gt;&lt;P&gt;# The channel can be defined as follows.&lt;BR /&gt;agent.sources.localsource.channels = memoryChannel&lt;/P&gt;&lt;P&gt;# Each sink's type must be defined&lt;BR /&gt;agent.sinks.avro_Sink.type = avro&lt;BR /&gt;agent.sinks.avro_Sink.hostname=192.168.44.4&lt;BR /&gt;agent.sinks.avro_Sink.port= 8021&lt;BR /&gt;agent.sinks.avro_Sink.avro.batchSize = 10000&lt;BR /&gt;agent.sinks.avro_Sink.avro.rollCount = 5000&lt;BR /&gt;agent.sinks.avro_Sink.avro.rollSize = 500&lt;BR /&gt;agent.sinks.avro_Sink.avro.rollInterval = 30&lt;BR /&gt;agent.sinks.avro_Sink.channel = memoryChannel&lt;/P&gt;&lt;P&gt;# Each channel's type is defined.&lt;BR /&gt;agent.channels.memoryChannel.type = memory&lt;BR /&gt;agent.channels.memoryChannel.capacity = 10000&lt;BR /&gt;agent.channels.memoryChannel.transactionCapacity = 10000&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;------Remote config&lt;/P&gt;&lt;P&gt;# Please paste flume.conf here. Example:&lt;/P&gt;&lt;P&gt;# Sources, channels, and sinks are defined per&lt;BR /&gt;# agent name, in this case 'tier1'.&lt;BR /&gt;tier1.sources = source1&lt;BR /&gt;tier1.channels = channel1&lt;BR /&gt;tier1.sinks = sink1&lt;/P&gt;&lt;P&gt;# For each source, channel, and sink, set&lt;BR /&gt;tier1.sources.source1.type = avro&lt;BR /&gt;tier1.sources.source1.bind = 192.168.44.4&lt;BR /&gt;tier1.sources.source1.port=8021&lt;BR /&gt;tier1.sources.source1.channels = channel1&lt;BR /&gt;tier1.channels.channel1.type = memory&lt;BR /&gt;tier1.sinks.sink1.type = hdfs&lt;BR /&gt;tier1.sinks.sink1.channel = channel1&lt;BR /&gt;tier1.sinks.sink1.hdfs.path = hdfs://192.168.44.4:8020/user/hadoop/flumelogs/&lt;BR /&gt;tier1.sinks.sink1.hdfs.fileType = DataStream&lt;BR /&gt;tier1.sinks.sink1.hdfs.writeFormat= Text&lt;BR /&gt;tier1.sinks.sink1.hdfs.batchSize = 10000&lt;BR /&gt;tier1.sinks.sink1.hdfs.rollCount = 5000&lt;BR /&gt;tier1.sinks.sink1.hdfs.rollSize = 500&lt;BR /&gt;tier1.sinks.sink1.hdfs.rollInterval = 30&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# specify the capacity of the memory channel.&lt;BR /&gt;tier1.channels.channel1.capacity = 10000&lt;BR /&gt;tier1.channels.channel1.transactioncapacity=10000&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please help i want to populate full file from local to hdfs&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:11:53 GMT</pubDate>
    <dc:creator>Tejaponnaluru</dc:creator>
    <dc:date>2022-09-16T10:11:53Z</dc:date>
    <item>
      <title>Only 10 records populating from local to hdfs  while running flume but i have 500 records in my file</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39202#M24101</link>
      <description>&lt;P&gt;Here my config file&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-----Local Config&lt;/P&gt;&lt;P&gt;agent.sources = localsource&lt;BR /&gt;agent.channels = memoryChannel&lt;BR /&gt;agent.sinks = avro_Sink&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;agent.sources.localsource.type = exec&lt;BR /&gt;agent.sources.localsource.shell = /bin/bash -c&lt;BR /&gt;agent.sources.localsource.command = tail -F /home/dwh/teja/Flumedata/testfile.csv&lt;/P&gt;&lt;P&gt;# The channel can be defined as follows.&lt;BR /&gt;agent.sources.localsource.channels = memoryChannel&lt;/P&gt;&lt;P&gt;# Each sink's type must be defined&lt;BR /&gt;agent.sinks.avro_Sink.type = avro&lt;BR /&gt;agent.sinks.avro_Sink.hostname=192.168.44.4&lt;BR /&gt;agent.sinks.avro_Sink.port= 8021&lt;BR /&gt;agent.sinks.avro_Sink.avro.batchSize = 10000&lt;BR /&gt;agent.sinks.avro_Sink.avro.rollCount = 5000&lt;BR /&gt;agent.sinks.avro_Sink.avro.rollSize = 500&lt;BR /&gt;agent.sinks.avro_Sink.avro.rollInterval = 30&lt;BR /&gt;agent.sinks.avro_Sink.channel = memoryChannel&lt;/P&gt;&lt;P&gt;# Each channel's type is defined.&lt;BR /&gt;agent.channels.memoryChannel.type = memory&lt;BR /&gt;agent.channels.memoryChannel.capacity = 10000&lt;BR /&gt;agent.channels.memoryChannel.transactionCapacity = 10000&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;------Remote config&lt;/P&gt;&lt;P&gt;# Please paste flume.conf here. Example:&lt;/P&gt;&lt;P&gt;# Sources, channels, and sinks are defined per&lt;BR /&gt;# agent name, in this case 'tier1'.&lt;BR /&gt;tier1.sources = source1&lt;BR /&gt;tier1.channels = channel1&lt;BR /&gt;tier1.sinks = sink1&lt;/P&gt;&lt;P&gt;# For each source, channel, and sink, set&lt;BR /&gt;tier1.sources.source1.type = avro&lt;BR /&gt;tier1.sources.source1.bind = 192.168.44.4&lt;BR /&gt;tier1.sources.source1.port=8021&lt;BR /&gt;tier1.sources.source1.channels = channel1&lt;BR /&gt;tier1.channels.channel1.type = memory&lt;BR /&gt;tier1.sinks.sink1.type = hdfs&lt;BR /&gt;tier1.sinks.sink1.channel = channel1&lt;BR /&gt;tier1.sinks.sink1.hdfs.path = hdfs://192.168.44.4:8020/user/hadoop/flumelogs/&lt;BR /&gt;tier1.sinks.sink1.hdfs.fileType = DataStream&lt;BR /&gt;tier1.sinks.sink1.hdfs.writeFormat= Text&lt;BR /&gt;tier1.sinks.sink1.hdfs.batchSize = 10000&lt;BR /&gt;tier1.sinks.sink1.hdfs.rollCount = 5000&lt;BR /&gt;tier1.sinks.sink1.hdfs.rollSize = 500&lt;BR /&gt;tier1.sinks.sink1.hdfs.rollInterval = 30&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# specify the capacity of the memory channel.&lt;BR /&gt;tier1.channels.channel1.capacity = 10000&lt;BR /&gt;tier1.channels.channel1.transactioncapacity=10000&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please help i want to populate full file from local to hdfs&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:11:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39202#M24101</guid>
      <dc:creator>Tejaponnaluru</dc:creator>
      <dc:date>2022-09-16T10:11:53Z</dc:date>
    </item>
    <item>
      <title>Re: Only 10 records populating from local to hdfs  while running flume but i have 500 records in my</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39225#M24102</link>
      <description>You are using a 'tail -f' command on your (I assume) idempotent csv file, which is tailing the last 10 lines (by default) and would continue to tail if you are writing more data to that CSV file. If this file is in fact no longer being modified, and you want to index the whole file, then I would recommend using the spooldir source instead: &lt;A href="http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#spooling-directory-source" target="_blank"&gt;http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#spooling-directory-source&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;-PD</description>
      <pubDate>Thu, 31 Mar 2016 19:46:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39225#M24102</guid>
      <dc:creator>pdvorak</dc:creator>
      <dc:date>2016-03-31T19:46:43Z</dc:date>
    </item>
    <item>
      <title>Re: Only 10 records populating from local to hdfs  while running flume but i have 500 records in my</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39247#M24103</link>
      <description>Hi Thanks a lot for your reply. All my logs in csv format so if i want to transfer full log to hdfs i want to use spooldir instead of exec source is that u saying right. if so can you explain clearly in which scenario i can use exec source. Thanks in advance..</description>
      <pubDate>Fri, 01 Apr 2016 04:36:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39247#M24103</guid>
      <dc:creator>Tejaponnaluru</dc:creator>
      <dc:date>2016-04-01T04:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: Only 10 records populating from local to hdfs  while running flume but i have 500 records in my</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39248#M24104</link>
      <description>The exec source is generally not recommended for production environments, as it doesn't handle things well if the process that is getting spawned gets killed unexpectedly. With regards to your log files that you are transferring, are you trying to stream them, or just transport them into hdfs? You may want to consider just using an hdfs put command with a cron job, or mounting the hdfs filesystem via nfs, especially if you want to preserve the files in hdfs as-is. Flume is designed for streaming data, not as a file transport mechanism.&lt;BR /&gt;&lt;BR /&gt;If you do want to stream them then, the spooldir source would be used if the files are not being appended to. If they are being appended to while flume is reading them, then you would want to use the new taildir source (as of CDH5.5) [1], as it provides a more reliable handling of streaming log files. The spool dir source requires that files are not modified once they are in the spool directory, and they are removed or marked with .COMPLETED when ingestion is finished.&lt;BR /&gt;&lt;BR /&gt;-PD&lt;BR /&gt;&lt;BR /&gt;[1] &lt;A href="http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source" target="_blank"&gt;http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source&lt;/A&gt;</description>
      <pubDate>Fri, 01 Apr 2016 04:58:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39248#M24104</guid>
      <dc:creator>pdvorak</dc:creator>
      <dc:date>2016-04-01T04:58:56Z</dc:date>
    </item>
    <item>
      <title>Re: Only 10 records populating from local to hdfs  while running flume but i have 500 records in my</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39250#M24105</link>
      <description>Thanks for your quick reply. As you asked I want to transfer my log files everyday morning from sever x to hdfs. As you said i can use put command for that but i have different log files in server X, so i thought put won't be a good way to transfer files that's why i choose flume and that too another reason is i want to filter data from each log file but in put command we can't do that right</description>
      <pubDate>Fri, 01 Apr 2016 05:33:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39250#M24105</guid>
      <dc:creator>Tejaponnaluru</dc:creator>
      <dc:date>2016-04-01T05:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: Only 10 records populating from local to hdfs  while running flume but i have 500 records in my</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39262#M24106</link>
      <description>&lt;P&gt;Hi if i gave spooldir is this config fine&lt;BR /&gt;&lt;BR /&gt;# For each one of the sources, the type is defined&lt;BR /&gt;agent.sources.localsource.type = spooldir&lt;BR /&gt;#agent.sources.localsource.shell = /bin/bash -c&lt;BR /&gt;agent.sources.localsource.command = /home/dwh/teja/Flumedata/&lt;BR /&gt;agent.sources.localsource.fileHeader = true&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;or else i want to add file name as well in path&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:48:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39262#M24106</guid>
      <dc:creator>Tejaponnaluru</dc:creator>
      <dc:date>2016-04-01T09:48:43Z</dc:date>
    </item>
    <item>
      <title>Re: Only 10 records populating from local to hdfs  while running flume but i have 500 records in my</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39305#M24107</link>
      <description>Hi, As you said i'm using spooldir source it's working fine. But one problem is flume generating more files with less records but i want like one or two files. As i said before, i have 500 records log file i want to populate as one file this is just test case but in real scenario i have lakhs of records in one log file please help .&lt;BR /&gt;my config file is same as above which i shared with spooldir source</description>
      <pubDate>Mon, 04 Apr 2016 05:33:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Only-10-records-populating-from-local-to-hdfs-while-running/m-p/39305#M24107</guid>
      <dc:creator>Tejaponnaluru</dc:creator>
      <dc:date>2016-04-04T05:33:45Z</dc:date>
    </item>
  </channel>
</rss>

