Created 03-15-2016 04:45 AM
For reasons I won't go into we have a need to provide information about the partitions in a table.
The following statement provides that info:
show partitions database.table;
However that doesn't make the returned dataset queryable.
What I want to do is put a view over the top of that information so it can be filtered/aggregated. Essentially I want to do this:
CREATE VIEW database.<table>_partitions
as
show partitions database.table;
but I can't do that of course.
Is there a way to query the hive metastore in order to get this information?
Created 03-20-2016 10:29 PM
Hi Jamiet,
You should be able to achieve this using impala-shell and then store output as a table, i.e. from command line run something like:
impala-shell --delimited -q "show partitions database.table;" --output_file partitions.out
Then upload the output file to HDFS and create a table over it.
The only downside is that it is not dynamically updated, you'd have to define some schedule for it (maybe using oozie) at a frequency which is acceptable for your requirement.
Full reference for impala-shell is here.
Hope this helps,
Martin
Created 03-21-2016 01:47 PM
Thanks Martin. While I appreciate the response its not a great solution to be honest. We don't use impala shell, we use impyla, but regardless of that outputting to a file, moving to HDFS, creating a table over it is, then scheduling the update is laborious compared to a good old fashioned RDBMS with its queryable catalog. Moreover if this occurs on a schedule then there may be no guarantee the data therein is correct when I query it.
Not complaining as such, but I do miss features such as this.
Created 03-25-2016 02:59 AM