<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Create Hive table to read parquet files from parquet/avro schema in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98553#M11933</link>
    <description>&lt;P&gt;Thanks for this - works for Parquet, but how does one do this for a table from CSV? Let's say a CSV schema changes, I want to be able to use the Avro schema evolution to create the table.&lt;/P&gt;&lt;P&gt;I tried the same create statement, but using STORED AS TEXTFILE and with the ROW FORMAT DELIMITED etc. I end up getting null values.&lt;/P&gt;</description>
    <pubDate>Fri, 02 Dec 2016 12:32:24 GMT</pubDate>
    <dc:creator>jastang</dc:creator>
    <dc:date>2016-12-02T12:32:24Z</dc:date>
    <item>
      <title>Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98544#M11924</link>
      <description>&lt;P&gt;Hello Experts ! &lt;/P&gt;&lt;P&gt;We are looking for a solution in order to create an external hive table to read data from parquet files according to a parquet/avro schema. &lt;/P&gt;&lt;P&gt;
in other way, how to generate a hive table from a parquet/avro schema ?&lt;/P&gt;&lt;P&gt;
thanks &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 21:02:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98544#M11924</guid>
      <dc:creator>TAZIMehdi</dc:creator>
      <dc:date>2015-12-10T21:02:11Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98545#M11925</link>
      <description>&lt;P&gt;
	&lt;STRONG&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;Avro &lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;CREATE EXTERNAL TABLE table_avro  
	ROW FORMAT SERDE   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  
	TBLPROPERTIES  ('avro.schema.url'='hdfs://user/schemas/table_avro.avsc')  
	STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
 	OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
	LOCATION '/user/table/table_avro';
&lt;/PRE&gt;
&lt;P&gt;
	

	&lt;STRONG&gt;From Hive 0.14 and Up (Easier DDL)&lt;/STRONG&gt;

	&lt;/P&gt;&lt;PRE&gt;CREATE TABLE kst (
    string1 string,
    string2 string,
    int1 int,
    boolean1 boolean,
    long1 bigint,
    float1 float,
    double1 double,
    inner_record1 struct,
    enum1 string,
    array1 array,
    map1 map,
    union1 uniontype,
    fixed1 binary,
    null1 void,
    unionnullint int,
    bytes1 binary)
  PARTITIONED BY (ds string)
  STORED AS AVRO;
	&lt;/PRE&gt;&lt;P&gt;
See &lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual"&gt;Apache Hive language Docs&lt;/A&gt; also here for more examples on &lt;A href="https://cwiki.apache.org/confluence/display/Hive/AvroSerDe"&gt;Avro&lt;/A&gt; and &lt;A href="https://cwiki.apache.org/confluence/display/Hive/Parquet"&gt;Parquet &lt;/A&gt;&lt;/P&gt;&lt;P&gt;However to get true performance benefits of Hive with Cost Base optimization and Vectorization you should consider having your Hive tables in the &lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC"&gt;ORC&lt;/A&gt; format.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 23:29:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98545#M11925</guid>
      <dc:creator>amcbarnett</dc:creator>
      <dc:date>2015-12-10T23:29:42Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98546#M11926</link>
      <description>&lt;P&gt;Thanks for your answer, Actualy this is what i'm trying to do,I already have parquet files, and i  want dynamically create an external hive table to read from parquet files not Avro ones. according either an avro or parquet schema. ( the parquet was created from avro )&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 01:01:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98546#M11926</guid>
      <dc:creator>TAZIMehdi</dc:creator>
      <dc:date>2015-12-11T01:01:29Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98547#M11927</link>
      <description>&lt;P&gt;The work to generically create a table by reading a schema from orc, parquet and avro is tracked in &lt;A target="_blank" href="https://issues.apache.org/jira/browse/HIVE-10593"&gt;HIVE-10593&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 01:43:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98547#M11927</guid>
      <dc:creator>jpp</dc:creator>
      <dc:date>2015-12-11T01:43:31Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98548#M11928</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt; can you accept the best answer to close this thread?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 09:51:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98548#M11928</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-02T09:51:29Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98549#M11929</link>
      <description>&lt;P&gt;actually, there is no answer to my question, i'll publish soon the answer and accept it&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 23:04:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98549#M11929</guid>
      <dc:creator>TAZIMehdi</dc:creator>
      <dc:date>2016-02-02T23:04:38Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98550#M11930</link>
      <description>&lt;P&gt;The solution is to create dynamically a table from avro, and then create a new table of parquet format from the avro one.&lt;/P&gt;&lt;P&gt;there is the source code from Hive, which this helped you&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); 
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 02 Feb 2016 23:08:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98550#M11930</guid>
      <dc:creator>TAZIMehdi</dc:creator>
      <dc:date>2016-02-02T23:08:11Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98551#M11931</link>
      <description>&lt;P&gt;I am also facing same problem as you. I tried your solution but my parquet table is not getting refreshed if I modify the avro schema. avro_test table is getting modified if any schema changes happens but parquet_test is not getting changes. I Dropped and created again but still changes are not getting reflected. Any idea? &lt;/P&gt;</description>
      <pubDate>Wed, 17 Feb 2016 13:59:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98551#M11931</guid>
      <dc:creator>adhumal</dc:creator>
      <dc:date>2016-02-17T13:59:22Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98552#M11932</link>
      <description>&lt;P&gt;As &lt;A rel="user" href="https://community.cloudera.com/users/152/jp.html" nodeid="152"&gt;@Jean-Philippe Player&lt;/A&gt; mentions read Parquet directory as tables its not yet supported by Hive. Source: &lt;A href="http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_parquet.html" target="_blank"&gt;http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_parquet.html&lt;/A&gt;. You are able to do it in Impala:&lt;/P&gt;&lt;PRE&gt;# Using Impala:
CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
  STORED AS PARQUET
  LOCATION '/user/etl/destination';
&lt;/PRE&gt;&lt;P&gt;With some spark/scala code you can generate the create table statement based on a parquet file:&lt;/P&gt;&lt;PRE&gt;spark.read.parquet("/user/etl/destination/datafile1.dat").registerTempTable("mytable")
val df = sqlContext.sql("describe mytable")
// "colname (space) data-type"
val columns = df.map(row =&amp;gt; row(0) + " " + row(1)).collect()

// Print the Hive create table statement:
println("CREATE EXTERNAL TABLE mytable")
println(s"  (${columns.mkString(", ")})")
println("STORED AS PARQUET ")
println("LOCATION '/user/etl/destination/datafile1.dat';")
&lt;/PRE&gt;</description>
      <pubDate>Wed, 07 Sep 2016 17:14:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98552#M11932</guid>
      <dc:creator>MrBee</dc:creator>
      <dc:date>2016-09-07T17:14:17Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98553#M11933</link>
      <description>&lt;P&gt;Thanks for this - works for Parquet, but how does one do this for a table from CSV? Let's say a CSV schema changes, I want to be able to use the Avro schema evolution to create the table.&lt;/P&gt;&lt;P&gt;I tried the same create statement, but using STORED AS TEXTFILE and with the ROW FORMAT DELIMITED etc. I end up getting null values.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2016 12:32:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/98553#M11933</guid>
      <dc:creator>jastang</dc:creator>
      <dc:date>2016-12-02T12:32:24Z</dc:date>
    </item>
    <item>
      <title>Re: Create Hive table to read parquet files from parquet/avro schema</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/290134#M11934</link>
      <description>&lt;P&gt;with newer versions of spark, the sqlContext is not load by default, you have to specify it explicitly :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;scala&amp;gt; val sqlContext = new org.apache.spark.sql.SQLContext(sc)&lt;BR /&gt;warning: there was one deprecation warning; re-run with -deprecation for details&lt;BR /&gt;sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6179af64&lt;BR /&gt;&lt;BR /&gt;scala&amp;gt; import sqlContext.implicits._&lt;BR /&gt;import sqlContext.implicits._&lt;BR /&gt;&lt;BR /&gt;scala&amp;gt; sqlContext.sql("describe mytable")&lt;BR /&gt;res2: org.apache.spark.sql.DataFrame = [col_name: string, data_type: string ... 1 more field]&lt;BR /&gt;&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;I'm working with spark 2.3.2&lt;/P&gt;</description>
      <pubDate>Thu, 20 Feb 2020 06:49:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Hive-table-to-read-parquet-files-from-parquet-avro/m-p/290134#M11934</guid>
      <dc:creator>obrobecker</dc:creator>
      <dc:date>2020-02-20T06:49:10Z</dc:date>
    </item>
  </channel>
</rss>

