Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

compressed ORC files not making sense

Highlighted

compressed ORC files not making sense

Master Collaborator

I imported same table twice , once compressed and once uncompressed . comparing the two I have four questions marked as 1,2,3,4 ( please see below)

parameter used : --hcatalog-storage-stanza "stored as orcfile"
Location:               hdfs://hdfs-ha/apps/hive/warehouse/pa_lane_txn_orc
Table Type:             MANAGED_TABLE
Table Parameters:
        numFiles                4
        numRows                 0               <<<< 1) number of rows shown as zero ?
        rawDataSize             0               <<<< 2) rawDataSize shown as zero ?
        totalSize               205994912       <<<< 3) totalSize is less than the compressed ?
        transient_lastDdlTime   1498767240

Compressed:             No         
parameter used: --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'
Location:               hdfs://hdfs-ha/apps/hive/warehouse/pa_lane_txn_orc
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
        numFiles                4
        numRows                 9999999
        orc.compress            SNAPPY
        rawDataSize             32315364706
        totalSize               318486342
        transient_lastDdlTime   1498766230
Compressed:             No    <<<< 4) compressed flag is showing NO even its SNAPPY compressed ?

2 REPLIES 2
Highlighted

Re: compressed ORC files not making sense

Master Collaborator

and I again imported the table in orc snappy compressed form , this time its showing the numRows and rawDataSize also as zero ?

Location:               hdfs://hdfs-ha/apps/hive/warehouse/pa_lane_txn_orc
Table Type:             MANAGED_TABLE
Table Parameters:
        numFiles                4
        numRows                 0
        orc.compress            SNAPPY
        rawDataSize             0
        totalSize               318486342
        transient_lastDdlTime   1498768617

Re: compressed ORC files not making sense

Explorer

1 & 2) Trying running "analyze table" to generate row and data size statistics.

https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE

3) ORC files are compressed with zlib by default. zlib offers a higher level of compression than snappy. If you don't want compression you have to set orc.compress to "NONE"

4) I believe this is referencing the hive compression feature. Text files can be gzipped or bzipped and still read by Hive.

https://cwiki.apache.org/confluence/display/Hive/CompressedStorage

Don't have an account?
Coming from Hortonworks? Activate your account here