<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Apache Flume and parquet in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8656#M1503</link>
    <description>&lt;P&gt;You're welcome!&lt;/P&gt;</description>
    <pubDate>Thu, 10 Apr 2014 18:22:10 GMT</pubDate>
    <dc:creator>mpercy</dc:creator>
    <dc:date>2014-04-10T18:22:10Z</dc:date>
    <item>
      <title>Apache Flume and parquet</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8508#M1500</link>
      <description>&lt;P&gt;Hi.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is it possible configure Apache Flume to save my logs in HDFS with Parquet?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks very much!!!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Miguel Angel.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:56:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8508#M1500</guid>
      <dc:creator>masfworld</dc:creator>
      <dc:date>2022-09-16T08:56:46Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Flume and parquet</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8538#M1501</link>
      <description>&lt;P&gt;There's been some debate about this. Personally, I believe that Flume is an inherently stream-based, row-oriented system, and Parquet is an inherently batch-optimized, column-oriented format. So I'm not sure whether it's a great fit in terms of direct output.&amp;nbsp;&lt;SPAN style="line-height: 1.2;"&gt;On the other hand, some folks argue that it can make sense in some cases, and that is true.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;While I don't know of a way to get Parquet &lt;EM&gt;directly&lt;/EM&gt; out of Flume today, I explored one way to get data from Flume into Impala, and ultimately stored as Parquet for fast, columnar querying in this presentation I gave at Hadoop Summit 2013:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://github.com/mpercy/flume-rtq-hadoop-summit-2013/blob/master/flume-low-latency-analytics-hadoop-summit-2013.pdf?raw=true"&gt;https://github.com/mpercy/flume-rtq-hadoop-summit-2013/blob/master/flume-low-latency-analytics-hadoop-summit-2013.pdf?raw=true&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The basic idea is that you store the data in Avro format from Flume, then use Impala to convert the data to Parquet on a schedule. This has some pretty nice properties, like low-latency access to the data. Now that Views are available in recent versions of Impala, that approach should be even easier to use.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Mike&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Apr 2014 06:50:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8538#M1501</guid>
      <dc:creator>mpercy</dc:creator>
      <dc:date>2014-04-09T06:50:37Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Flume and parquet</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8548#M1502</link>
      <description>&lt;P&gt;Yes. Use Impala or Hive to convert to Parquet stream from Flume is a good option, although it would be nice to have it natively.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!!!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Miguel Angel.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Apr 2014 15:15:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8548#M1502</guid>
      <dc:creator>masfworld</dc:creator>
      <dc:date>2014-04-09T15:15:59Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Flume and parquet</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8656#M1503</link>
      <description>&lt;P&gt;You're welcome!&lt;/P&gt;</description>
      <pubDate>Thu, 10 Apr 2014 18:22:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/8656#M1503</guid>
      <dc:creator>mpercy</dc:creator>
      <dc:date>2014-04-10T18:22:10Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Flume and parquet</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/12952#M1504</link>
      <description>&lt;P&gt;Hi Mike,&lt;/P&gt;&lt;P&gt;How do you convert the avro data to parquet, and what do you use to schedule this process?&lt;/P&gt;&lt;P&gt;Is the code hosted somewhere? Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 29 May 2014 09:53:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/12952#M1504</guid>
      <dc:creator>mohit.mehrotra</dc:creator>
      <dc:date>2014-05-29T09:53:50Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Flume and parquet</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/12954#M1505</link>
      <description>Impala can do the conversion via SQL statements. I'd recommend asking the&lt;BR /&gt;Impala guys for advice there as my information is a bit dated on this&lt;BR /&gt;front, now that views and improved meta store features have been added.&lt;BR /&gt;&lt;BR /&gt;Mike&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 29 May 2014 09:58:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Flume-and-parquet/m-p/12954#M1505</guid>
      <dc:creator>mpercy</dc:creator>
      <dc:date>2014-05-29T09:58:30Z</dc:date>
    </item>
  </channel>
</rss>

