Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
select * from t1 where col1 = 'aaa' limit 2;

If you see such scenario, make sure to verify the no. of files in each partitions of the table. It is possible that if there are too many small files in the order of KBs under each partition then most of the time is spent in just opening the file in ORC. Check if it is expected to have so many files in each partition or if it is possible to merge them before loading data.

These are the parameters to consider

hive.merge.tezfiles=true 
hive.merge.mapredfiles=true
hive.merge.mapfiles=true 
hive.merge.orcfile.stripe.level=true 

In such situation, checking the no of files in the partition should be the first protocol & verifying anything else should be only secondary.

861 Views