<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive table format and compression in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111357#M74205</link>
    <description>&lt;P&gt;
	If you create a Hive table over an &lt;EM&gt;existing&lt;/EM&gt; data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE. &lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE sourcetable (col bigint)
row format delimited
fields terminated by ","
STORED as TEXTFILE
LOCATION 'hdfs:///data/sourcetable';
&lt;/PRE&gt;&lt;P&gt;Once the data is mapped, you can convert it to other formats like parquet:&lt;/P&gt;&lt;PRE&gt;set parquet.compression=SNAPPY; --this is the default actually
CREATE TABLE testsnappy_pq
STORED AS PARQUET
AS SELECT * FROM sourcetable;
&lt;/PRE&gt;&lt;P&gt;For the hive optimized ORC format, the syntax is slightly different:&lt;/P&gt;&lt;PRE&gt;CREATE TABLE testsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;&lt;/PRE&gt;</description>
    <pubDate>Fri, 22 Apr 2016 14:58:44 GMT</pubDate>
    <dc:creator>jpp</dc:creator>
    <dc:date>2016-04-22T14:58:44Z</dc:date>
    <item>
      <title>Hive table format and compression</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111356#M74204</link>
      <description>&lt;P&gt;Throwing this error while creating Hive parquet table with snappy compression in hive beeline mode. &lt;/P&gt;&lt;P&gt;Error: Error while compiling statement: FAILED: ParseException line 19:15 cannot recognize input near 'parquet' '.' 'compress' in table properties list (state=42000,code=40000)&lt;/P&gt;&lt;P&gt;CREATE EXTERNAL TABLE testsnappy  ( column bigint )&lt;/P&gt;&lt;P&gt; row format delimited &lt;/P&gt;&lt;P&gt; fields terminated by ',' &lt;/P&gt;&lt;P&gt; STORED as PARQUET &lt;/P&gt;&lt;P&gt; LOCATION 'path' &lt;/P&gt;&lt;P&gt; TBLPROPERTIES ("parquet.compress"="SNAPPY") " ;&lt;/P&gt;&lt;P&gt;Also is there a way to set compression format for already created tables ?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 13:14:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111356#M74204</guid>
      <dc:creator>Aswanth11</dc:creator>
      <dc:date>2016-04-22T13:14:35Z</dc:date>
    </item>
    <item>
      <title>Re: Hive table format and compression</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111357#M74205</link>
      <description>&lt;P&gt;
	If you create a Hive table over an &lt;EM&gt;existing&lt;/EM&gt; data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE. &lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE sourcetable (col bigint)
row format delimited
fields terminated by ","
STORED as TEXTFILE
LOCATION 'hdfs:///data/sourcetable';
&lt;/PRE&gt;&lt;P&gt;Once the data is mapped, you can convert it to other formats like parquet:&lt;/P&gt;&lt;PRE&gt;set parquet.compression=SNAPPY; --this is the default actually
CREATE TABLE testsnappy_pq
STORED AS PARQUET
AS SELECT * FROM sourcetable;
&lt;/PRE&gt;&lt;P&gt;For the hive optimized ORC format, the syntax is slightly different:&lt;/P&gt;&lt;PRE&gt;CREATE TABLE testsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;&lt;/PRE&gt;</description>
      <pubDate>Fri, 22 Apr 2016 14:58:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111357#M74205</guid>
      <dc:creator>jpp</dc:creator>
      <dc:date>2016-04-22T14:58:44Z</dc:date>
    </item>
    <item>
      <title>Re: Hive table format and compression</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111358#M74206</link>
      <description>&lt;P&gt;Just a little comment. While in old versions of HDP for ORC files snappy provided performance benefits over zip this is not true anymore. Zip has three times better compression AND is as fast or faster now than snappy for most tables.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 15:17:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111358#M74206</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-04-22T15:17:10Z</dc:date>
    </item>
    <item>
      <title>Re: Hive table format and compression</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111359#M74207</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;@Benjamin Leonhardi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;All i need to do is on Hive external tables directly.&lt;/P&gt;&lt;P&gt;1.My above DDL statement was not working when i try to create parquet external table with snappy compression.&lt;/P&gt;&lt;P&gt;2. Is there a way to alter compression from snappy to ZIP in an existing hive external table.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Apr 2016 16:26:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111359#M74207</guid>
      <dc:creator>Aswanth11</dc:creator>
      <dc:date>2016-04-25T16:26:28Z</dc:date>
    </item>
    <item>
      <title>Re: Hive table format and compression</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111360#M74208</link>
      <description>&lt;P&gt;2. Your only chance is a CTAS. I.e. create a new table "as" the old one compressed as zip then rename them. You can do that with external tables as well. However this is only true of new Hive versions and ORCs/Tez. For Parquet snappy may still be better. &lt;/P&gt;</description>
      <pubDate>Mon, 25 Apr 2016 16:40:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111360#M74208</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-04-25T16:40:39Z</dc:date>
    </item>
    <item>
      <title>Re: Hive table format and compression</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111361#M74209</link>
      <description>&lt;P&gt;The CREATE EXTERNAL TABLE statement must match the format on disk. If the files are in a self-describing format like parquet, you should not need to specify any table properties to read them (remove the TBLPROPERTIES line). If you want to convert to a new format, including a different compression algorithm, you will need to create a new table.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 05:56:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-table-format-and-compression/m-p/111361#M74209</guid>
      <dc:creator>jpp</dc:creator>
      <dc:date>2016-04-27T05:56:12Z</dc:date>
    </item>
  </channel>
</rss>

