12-18-2017 06:04 AM - last edited on 12-18-2017 06:06 AM by cjervis
Does the impala of certainty recognize the Hive partitions?
I want to ask if, when performing a query with the "where" clause, does the impala only go through the partitioned folder if there are partitions of course, or does it go through all the folders in it?
how do I make sure it has only scrolled through the partition folder or has it all gone through?
12-19-2017 09:56 AM
Impala definitely does partition pruning (if your where clause filters on at least one partition).
You can see that it's doing so by finding your query in CM, viewing or downloading the profile and checking the SCAN HDFS fragments, which will say something like "partitions=140/8519 files=140 size=1.99GB" (i.e. only 140 out of 8519 partitions were read).
However Impala is not aware of those partitions initially since your table was created and inserted to by other means (Hive), so you must run "ALTER TABLE name RECOVER PARTITIONS" (and of course your Impala table definition has to specify the same partitions as in Hive so the data directories match).