Support Questions
Find answers, ask questions, and share your expertise

compressed ORC files not making sense

Highlighted

compressed ORC files not making sense

Master Collaborator

I imported same table twice , once compressed and once uncompressed . comparing the two I have four questions marked as 1,2,3,4 ( please see below)

parameter used : --hcatalog-storage-stanza "stored as orcfile"
Location:               hdfs://hdfs-ha/apps/hive/warehouse/pa_lane_txn_orc
Table Type:             MANAGED_TABLE
Table Parameters:
        numFiles                4
        numRows                 0               <<<< 1) number of rows shown as zero ?
        rawDataSize             0               <<<< 2) rawDataSize shown as zero ?
        totalSize               205994912       <<<< 3) totalSize is less than the compressed ?
        transient_lastDdlTime   1498767240

Compressed:             No         
parameter used: --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'
Location:               hdfs://hdfs-ha/apps/hive/warehouse/pa_lane_txn_orc
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
        numFiles                4
        numRows                 9999999
        orc.compress            SNAPPY
        rawDataSize             32315364706
        totalSize               318486342
        transient_lastDdlTime   1498766230
Compressed:             No    <<<< 4) compressed flag is showing NO even its SNAPPY compressed ?

2 REPLIES 2
Highlighted

Re: compressed ORC files not making sense

Master Collaborator

and I again imported the table in orc snappy compressed form , this time its showing the numRows and rawDataSize also as zero ?

Location:               hdfs://hdfs-ha/apps/hive/warehouse/pa_lane_txn_orc
Table Type:             MANAGED_TABLE
Table Parameters:
        numFiles                4
        numRows                 0
        orc.compress            SNAPPY
        rawDataSize             0
        totalSize               318486342
        transient_lastDdlTime   1498768617

Re: compressed ORC files not making sense

Explorer

1 & 2) Trying running "analyze table" to generate row and data size statistics.

https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE

3) ORC files are compressed with zlib by default. zlib offers a higher level of compression than snappy. If you don't want compression you have to set orc.compress to "NONE"

4) I believe this is referencing the hive compression feature. Text files can be gzipped or bzipped and still read by Hive.

https://cwiki.apache.org/confluence/display/Hive/CompressedStorage

Don't have an account?