<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Parquet external schema in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43909#M57075</link>
    <description>&lt;P&gt;Thanks Harsh for confirming there is no external schema file concept in Parquet and for sharing the link for CREATE TABLE ... LIKE PARQUET ... syntax.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This seems to be specific to Impala however, is there a generic approach to use across a stack of tools including Spark, Pig, Hive as well as Impala (and with Spark and Pig not using HCatalog)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks,&lt;/P&gt;&lt;P&gt;Martin&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 14 Aug 2016 13:07:08 GMT</pubDate>
    <dc:creator>MartinEK</dc:creator>
    <dc:date>2016-08-14T13:07:08Z</dc:date>
    <item>
      <title>Does Parquet support a notion of defining and managing schemas externally?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43906#M57073</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;, in a similar way to Avro with avsc schema files which can be referenced in CREATE TABLE statements?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Martin&lt;/P&gt;</description>
      <pubDate>Mon, 15 Aug 2016 12:55:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43906#M57073</guid>
      <dc:creator>MartinEK</dc:creator>
      <dc:date>2016-08-15T12:55:31Z</dc:date>
    </item>
    <item>
      <title>Re: Parquet external schema</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43907#M57074</link>
      <description>Impala lets you create a Parquet table from an example data file but&lt;BR /&gt;there's no separate schema file concept in the Parquet storage&lt;BR /&gt;implementation today.&lt;BR /&gt;&lt;BR /&gt;The LIKE 'FILE' feature is described further at&lt;BR /&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_ddl" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_ddl&lt;/A&gt;,&lt;BR /&gt;after which if you want to evolve the schema you can read on at&lt;BR /&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_schema_evolution" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_schema_evolution&lt;/A&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 14 Aug 2016 10:24:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43907#M57074</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2016-08-14T10:24:17Z</dc:date>
    </item>
    <item>
      <title>Re: Parquet external schema</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43909#M57075</link>
      <description>&lt;P&gt;Thanks Harsh for confirming there is no external schema file concept in Parquet and for sharing the link for CREATE TABLE ... LIKE PARQUET ... syntax.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This seems to be specific to Impala however, is there a generic approach to use across a stack of tools including Spark, Pig, Hive as well as Impala (and with Spark and Pig not using HCatalog)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks,&lt;/P&gt;&lt;P&gt;Martin&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Aug 2016 13:07:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43909#M57075</guid>
      <dc:creator>MartinEK</dc:creator>
      <dc:date>2016-08-14T13:07:08Z</dc:date>
    </item>
    <item>
      <title>Re: Parquet external schema</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43914#M57076</link>
      <description>&lt;P&gt;The whole support around Parquet is documented at &lt;A href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_parquet.html" target="_blank"&gt;http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_parquet.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Impala's support for Parquet is ahead of Hive at this moment, while &lt;A href="https://issues.apache.org/jira/browse/HIVE-8950" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-8950&lt;/A&gt; will help it catch up in future. In Hive you will still need to manually specify a column, but you may alternatively create the table in Impala and use it then in Hive.&lt;BR /&gt;&lt;BR /&gt;Parquet's loader in Pig supports reading the schema off the file [1] [2], as does Spark's Parquet support [3]. None of the eco system approaches use an external schema file as was the case with Avro storages.&lt;BR /&gt;&lt;BR /&gt;[1] - &lt;A href="https://github.com/Parquet/parquet-mr/blob/master/parquet-pig/src/main/java/parquet/pig/ParquetLoader.java#L90-L95" target="_blank"&gt;https://github.com/Parquet/parquet-mr/blob/master/parquet-pig/src/main/java/parquet/pig/ParquetLoader.java#L90-L95&lt;/A&gt;&lt;BR /&gt;[2] - &lt;A href="https://github.com/Parquet/parquet-mr/blob/master/parquet-pig/src/test/java/parquet/pig/TestParquetLoader.java#L94-L97" target="_blank"&gt;https://github.com/Parquet/parquet-mr/blob/master/parquet-pig/src/test/java/parquet/pig/TestParquetLoader.java#L94-L97&lt;/A&gt;&lt;BR /&gt;[3] - &lt;A href="http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files" target="_blank"&gt;http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Aug 2016 18:41:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Does-Parquet-support-a-notion-of-defining-and-managing/m-p/43914#M57076</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2016-08-14T18:41:19Z</dc:date>
    </item>
  </channel>
</rss>

