<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Create Compressed avro Hive table in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116972#M55054</link>
    <description>&lt;P&gt;Although there is no extension, looking at the metadata of the avro file I see indeed that it is compressed.&lt;/P&gt;&lt;P&gt;This brings 2 questions to mind:&lt;/P&gt;&lt;P&gt;- If I load data is many sessions, some with compression and some without, I would have a set of files in the hdfs directory, some compressed, some not, is that correct?&lt;/P&gt;&lt;P&gt;- Is there a way to globally set the compression parameters in hive, to not have to explicitly give them for each session?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;</description>
    <pubDate>Wed, 22 Feb 2017 14:33:06 GMT</pubDate>
    <dc:creator>guillaume_roger</dc:creator>
    <dc:date>2017-02-22T14:33:06Z</dc:date>
    <item>
      <title>Create Compressed avro Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116969#M55051</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I want to create a compressed avro-backed hive table and load data in it.&lt;/P&gt;&lt;P&gt;The flow is as follow:&lt;/P&gt;&lt;PRE&gt;CREATE TABLE IF NOT EXISTS events (...) STORED AS AVRO LOCATION '...';
INSERT OVERWRITE TABLE events SELECT ... FROM other_table;&lt;/PRE&gt;&lt;P&gt;Then if I DESCRIBE FORMATTED the table, I see &lt;/P&gt;&lt;BLOCKQUOTE&gt;Compressed: no&lt;/BLOCKQUOTE&gt;&lt;P&gt;As far as I understand it, to have compressed data, I should just add before all statements&lt;/P&gt;&lt;PRE&gt;SET hive.exec.compress.output=true;
SET avro.output.codec=snappy;
&lt;/PRE&gt;&lt;P&gt;But this does not change anything.&lt;/P&gt;&lt;P&gt;I tried to add as well:&lt;/P&gt;&lt;PRE&gt;SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
&lt;/PRE&gt;&lt;P&gt;and even a TBLPROPERTIES:&lt;/P&gt;&lt;PRE&gt;TBLPROPERTIES("avro.output.codec"="snappy")&lt;/PRE&gt;&lt;P&gt;To no avail.&lt;/P&gt;&lt;P&gt;Could anybody point me to what I am missing?&lt;/P&gt;&lt;P&gt;I am on hdp 2.5.3, llap not enabled, commands run via beeline in a script file.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Feb 2017 19:27:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116969#M55051</guid>
      <dc:creator>guillaume_roger</dc:creator>
      <dc:date>2017-02-21T19:27:15Z</dc:date>
    </item>
    <item>
      <title>Re: Create Compressed avro Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116970#M55052</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/13690/guillaumeroger.html"&gt;Guillaume Roger&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Your steps are correct.&lt;/P&gt;&lt;P&gt;Please be advised that the Compressed field in your DESCRIBED FORMATTED is not a reliable indicator of whether the table contains compressed data. It typically shows No, because the compression settings only apply during the session that loads data and are not stored persistently with the table metadata. The compression in desc formatted may be input or intermediate compression rather than output.&lt;/P&gt;&lt;P&gt;Look at the actual files as they are stored for the Hive table in question.&lt;/P&gt;&lt;P&gt;***&lt;/P&gt;&lt;P&gt;If this cleared the dilemma, please vote and accept it as the best answer.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2017 05:18:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116970#M55052</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-02-22T05:18:03Z</dc:date>
    </item>
    <item>
      <title>Re: Create Compressed avro Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116971#M55053</link>
      <description>&lt;P&gt;To check hdfs run something like this:&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;dfs -lsr hdfs://localhost:9000/user/hive/warehouse/events;&lt;/PRE&gt;
&lt;P&gt;Replace with your host.&lt;/P&gt;&lt;P&gt;The extension will tell you whether is compressed.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2017 05:24:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116971#M55053</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-02-22T05:24:47Z</dc:date>
    </item>
    <item>
      <title>Re: Create Compressed avro Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116972#M55054</link>
      <description>&lt;P&gt;Although there is no extension, looking at the metadata of the avro file I see indeed that it is compressed.&lt;/P&gt;&lt;P&gt;This brings 2 questions to mind:&lt;/P&gt;&lt;P&gt;- If I load data is many sessions, some with compression and some without, I would have a set of files in the hdfs directory, some compressed, some not, is that correct?&lt;/P&gt;&lt;P&gt;- Is there a way to globally set the compression parameters in hive, to not have to explicitly give them for each session?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2017 14:33:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116972#M55054</guid>
      <dc:creator>guillaume_roger</dc:creator>
      <dc:date>2017-02-22T14:33:06Z</dc:date>
    </item>
    <item>
      <title>Re: Create Compressed avro Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116973#M55055</link>
      <description>&lt;P&gt;I can at least confirm that setting the following in hive-site:&lt;/P&gt;&lt;PRE&gt;"hive.exec.compress.output" : "true"
"hive.exec.compress.intermediate" : "true"
"avro.output.codec": "snappy"&lt;/PRE&gt;&lt;P&gt;Are enough to have compression globally.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2017 22:22:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116973#M55055</guid>
      <dc:creator>guillaume_roger</dc:creator>
      <dc:date>2017-02-22T22:22:52Z</dc:date>
    </item>
    <item>
      <title>Re: Create Compressed avro Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116974#M55056</link>
      <description>&lt;P&gt;I applied the same above setting that you have mentioned and created a table with "avro.compress=snappy" as TBLPROPERTIES, but the compression ratio is same. I am not sure if compression is applied on this table. Is there any way to validate if it is compressed or not?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jun 2017 09:37:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Create-Compressed-avro-Hive-table/m-p/116974#M55056</guid>
      <dc:creator>dileepgurala</dc:creator>
      <dc:date>2017-06-07T09:37:01Z</dc:date>
    </item>
  </channel>
</rss>

