Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala and Hive


Impala and Hive

New Contributor

Does the impala of certainty recognize the Hive partitions?
I want to ask if, when performing a query with the "where" clause, does the impala only go through the partitioned folder if there are partitions of course, or does it go through all the folders in it?


how do I make sure it has only scrolled through the partition folder or has it all gone through?


Re: Impala and Hive


Impala definitely does partition pruning (if your where clause filters on at least one partition). 


You can see that it's doing so by finding your query in CM, viewing or downloading the profile and checking the SCAN HDFS fragments, which will say something like "partitions=140/8519 files=140 size=1.99GB" (i.e. only 140 out of 8519 partitions were read).


However Impala is not aware of those partitions initially since your table was created and inserted to by other means (Hive), so you must run "ALTER TABLE name RECOVER PARTITIONS" (and of course your Impala table definition has to specify the same partitions as in Hive so the data directories match).



Don't have an account?
Coming from Hortonworks? Activate your account here