Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)
New Contributor

Have recently run into multiple issues where ORC files on hive are not getting compacted.

There are a couple of parameters required to enable concat on ORC.

  • SET hive.merge.tezfiles=true; SET hive.execution.engine=tez;
  • SET hive.merge.mapredfiles=true;
  • SET hive.merge.size.per.task=256000000;
  • SET hive.merge.smallfiles.avgsize=256000000;
  • SET hive.merge.mapfiles=true;
  • SET hive.merge.orcfile.stripe.level=true;
  • SET mapreduce.input.fileinputformat.split.minsize=256000000;
  • SET mapreduce.input.fileinputformat.split.maxsize=256000000;
  • SET mapreduce.input.fileinputformat.split.minsize.per.node=256000000;
  • SET mapreduce.input.fileinputformat.split.minsize.per.rack=256000000;
    • ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='FALSE');
    • alter table <table_name> partition ( file_date_partition='<partition_info>') concatenate;
    • ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE');
mapreduce.input.fileinputformat.split.minsize.per.nodeSpecifies the minimum number of bytes that each input split should contain within a data node. The default value is 0, meaning that there is no minimum size
mapreduce.input.fileinputformat.split.minsize.per.rackSpecifies the minimum number of bytes that each input split should contain within a single rack. The default value is 0, meaning that there is no minimum size

Make sure not to concat orc files if they are generated by spark as there is a know issue HIVE-17403 and hence being disabled in later versions.

  • Example of this is a table/partition having 2 different files files (part-m-00000_1417075294718 and part-m-00018_1417075294718). Although both are completely different files, hive thinks these are files generated by separate instances of same task (because of failure or speculative execution). Hive will end up removing this file
1,440 Views
0 Kudos
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎06-04-2018 04:37 PM
Updated by:
 
Contributors
Top Kudoed Authors