<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Sqoop import to HCatalog/Hive : Compression dilemma in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105249#M38034</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/195/vranganathan.html"&gt;vranganathan&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I already did that now only one confusion remains - &lt;A target="_blank" href="https://community.hortonworks.com/questions/52037/sqoop-import-to-hcataloghive-compression-not-worki.html#answer-52068"&gt;is the compression taking place as expected&lt;/A&gt; ?&lt;/P&gt;</description>
    <pubDate>Mon, 05 Sep 2016 17:03:54 GMT</pubDate>
    <dc:creator>kaliyugantagoni</dc:creator>
    <dc:date>2016-09-05T17:03:54Z</dc:date>
    <item>
      <title>Sqoop import to HCatalog/Hive : Compression dilemma</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105245#M38030</link>
      <description>&lt;P&gt;HDP-2.4.2.0-258 installed using Ambari 2.2.2.0&lt;/P&gt;&lt;P&gt;There are aplenty schema in SQL Server and Oracle DB that need to be imported to Hadoop, I have chosen the rdbms to HCatalog/Hive approach.&lt;/P&gt;&lt;P&gt;I am quite confused because of the following threads :&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;As per the &lt;A target="_blank" href="https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive"&gt;Sqoop 1.4.6 documentation&lt;/A&gt; :&lt;/LI&gt;&lt;/UL&gt;&lt;BLOCKQUOTE&gt;One downside to compressing tables imported into Hive is that many codecs cannot be split for processing by parallel map tasks. The lzop codec, however, does support splitting. When importing tables with this codec, Sqoop will automatically index the files for splitting and configuring a new Hive table with the correct InputFormat. This feature currently requires that all partitions of a table be compressed with the lzop codec.
&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;STRONG&gt;Does that mean that gzip/zlib will cause performance/data integrity issues during Sqoop import AND subsequent processing?&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;The following from the &lt;A target="_blank" href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax"&gt;Hive documentation&lt;/A&gt; confused me :&lt;/LI&gt;&lt;/UL&gt;&lt;BLOCKQUOTE&gt;The parameters are all placed in the TBLPROPERTIES (see &lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable"&gt;Create Table&lt;/A&gt;). They are:&lt;/BLOCKQUOTE&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TH&gt;&lt;P&gt;Key&lt;/P&gt;
&lt;/TH&gt;&lt;TH&gt;
&lt;P&gt;Default&lt;/P&gt;&lt;/TH&gt;&lt;TH&gt;
&lt;P&gt;Notes&lt;/P&gt;&lt;/TH&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;orc.bloom.filter.columns&lt;/TD&gt;&lt;TD&gt;""&lt;/TD&gt;&lt;TD&gt;comma separated list of column names for which bloom filter should be created&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;
&lt;TD&gt;orc.bloom.filter.fpp&lt;/TD&gt;&lt;TD&gt;0.05&lt;/TD&gt;&lt;TD&gt;false positive probability for bloom filter (must &amp;gt;0.0 and &amp;lt;1.0)&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;
&lt;TD&gt;&lt;STRONG&gt;orc.compress&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;ZLIB&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;high level compression (one of NONE, ZLIB, SNAPPY)&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;UL&gt;
&lt;LI&gt;I guess the default compression codec is gzip, I executed the following command(with both -z and --compress ways) :&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;sqoop import --null-string '\\N' --null-non-string '\\N' --hive-delims-replacement '\0D' --num-mappers 8 --validate --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database default --hcatalog-table Inactivity --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" -z --connect 'jdbc:sqlserver://&amp;lt;IP&amp;gt;;database=VehicleDriverServicesFollowUp' --username  --password --table Inactivity -- --schema QlikView 2&amp;gt;&amp;amp;1| tee -a log&lt;/PRE&gt;&lt;P&gt;but the ORC table says &lt;STRONG&gt;compression : NO&lt;/STRONG&gt; (am I missing/misinterpreting something or is some lib. missing, I didn't get any exception/error) :&lt;/P&gt;&lt;PRE&gt;hive&amp;gt;
    &amp;gt;
    &amp;gt; describe formatted inactivity;
OK
# col_name              data_type               comment
period                  int
vin                     string
customerid              int
subscriberdealersisid   string
subscriberdistributorsisid      string
packagename             string
timemodify              string
# Detailed Table Information
Database:               default
Owner:                  hive
CreateTime:             Tue Aug 16 17:34:36 CEST 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://l4283t.sss.com:8020/apps/hive/warehouse/inactivity
Table Type:             MANAGED_TABLE
Table Parameters:
        transient_lastDdlTime   1471361676
# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.395 seconds, Fetched: 32 row(s)
hive&amp;gt;&lt;/PRE&gt;&lt;UL&gt;
&lt;LI&gt;As per &lt;A target="_blank" href="https://community.hortonworks.com/questions/4067/snappy-vs-zlib-pros-and-cons-for-each-compression.html"&gt;this existing thread&lt;/A&gt;, for Hive, &lt;STRONG&gt;ORC + Zlib&lt;/STRONG&gt; should be used&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;How do I specify this Zlib during the import command ? Is it the case that I have to pre-create that tables in Hive to use Zlib ?&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Aug 2016 22:52:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105245#M38030</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-08-16T22:52:38Z</dc:date>
    </item>
    <item>
      <title>Re: Sqoop import to HCatalog/Hive : Compression dilemma</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105246#M38031</link>
      <description>&lt;P&gt;i would pre-create the table with ORC an ZLib&lt;/P&gt;</description>
      <pubDate>Tue, 16 Aug 2016 23:46:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105246#M38031</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-08-16T23:46:12Z</dc:date>
    </item>
    <item>
      <title>Re: Sqoop import to HCatalog/Hive : Compression dilemma</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105247#M38032</link>
      <description>&lt;P&gt;I agree that's one way but that also means that if there are 100s of tables, one has to either manually pre-create those or execute some sqoop script to do so which means total two scripts to import one table, is there another way ?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Aug 2016 00:50:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105247#M38032</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-08-17T00:50:08Z</dc:date>
    </item>
    <item>
      <title>Re: Sqoop import to HCatalog/Hive : Compression dilemma</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105248#M38033</link>
      <description>&lt;P&gt;You can additional options to the --storage-stanza option.   The storage stanza is just what gets appended to the create table statement and you can add syntactically valid options (like tblproperties)&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2016 20:43:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105248#M38033</guid>
      <dc:creator>vranganathan</dc:creator>
      <dc:date>2016-09-02T20:43:40Z</dc:date>
    </item>
    <item>
      <title>Re: Sqoop import to HCatalog/Hive : Compression dilemma</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105249#M38034</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/195/vranganathan.html"&gt;vranganathan&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I already did that now only one confusion remains - &lt;A target="_blank" href="https://community.hortonworks.com/questions/52037/sqoop-import-to-hcataloghive-compression-not-worki.html#answer-52068"&gt;is the compression taking place as expected&lt;/A&gt; ?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Sep 2016 17:03:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105249#M38034</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-09-05T17:03:54Z</dc:date>
    </item>
    <item>
      <title>Re: Sqoop import to HCatalog/Hive : Compression dilemma</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105250#M38035</link>
      <description>&lt;P&gt;without compression &lt;/P&gt;&lt;PRE&gt;[numFiles=8, numRows=6547431, totalSize=66551787, rawDataSize=3154024078]&lt;/PRE&gt;&lt;P&gt;with zlib&lt;/P&gt;&lt;PRE&gt;[numFiles=8, numRows=6547431, totalSize=44046849, rawDataSize=3154024078]&lt;/PRE&gt;&lt;P&gt;As you can see, the totalSize is less with zlib.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 12:14:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Sqoop-import-to-HCatalog-Hive-Compression-dilemma/m-p/105250#M38035</guid>
      <dc:creator>vranganathan</dc:creator>
      <dc:date>2016-09-06T12:14:24Z</dc:date>
    </item>
  </channel>
</rss>

