Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
select * from t1 where col1 = 'aaa' limit 2;

If you see such scenario, make sure to verify the no. of files in each partitions of the table. It is possible that if there are too many small files in the order of KBs under each partition then most of the time is spent in just opening the file in ORC. Check if it is expected to have so many files in each partition or if it is possible to merge them before loading data.

These are the parameters to consider

hive.merge.tezfiles=true 
hive.merge.mapredfiles=true
hive.merge.mapfiles=true 
hive.merge.orcfile.stripe.level=true 

In such situation, checking the no of files in the partition should be the first protocol & verifying anything else should be only secondary.

306 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎06-29-2017 12:09 AM
Updated by:
 
Contributors
Top Kudoed Authors