Reply
Highlighted
Explorer
Posts: 18
Registered: ‎02-22-2016

Query the hive metastore from impala

For reasons I won't go into we have a need to provide information about the partitions in a table.

The following statement provides that info:

show partitions database.table;

However that doesn't make the returned dataset queryable.

What I want to do is put a view over the top of that information so it can be filtered/aggregated. Essentially I want to do this:

CREATE VIEW database.<table>_partitions 
as
show partitions database.table;

but I can't do that of course.

Is there a way to query the hive metastore in order to get this information?

Contributor
Posts: 58
Registered: ‎10-19-2014

Re: Query the hive metastore from impala

Hi Jamiet,

You should be able to achieve this using impala-shell and then store output as a table, i.e. from command line run something like:

 

impala-shell --delimited -q "show partitions database.table;" --output_file partitions.out

Then upload the output file to HDFS and create a table over it.

 

The only downside is that it is not dynamically updated, you'd have to define some schedule for it (maybe using oozie) at a frequency which is acceptable for your requirement.

 

Full reference for impala-shell is here.

 

Hope this helps,

Martin

 

Explorer
Posts: 18
Registered: ‎02-22-2016

Re: Query the hive metastore from impala

Thanks Martin. While I appreciate the response its not a great solution to be honest. We don't use impala shell, we use impyla, but regardless of that outputting to a file, moving to HDFS, creating a table over it is, then scheduling the update is laborious compared to a good old fashioned RDBMS with its queryable catalog. Moreover if this occurs on a schedule then there may be no guarantee the data therein is correct when I query it.

 

Not complaining as such, but I do miss features such as this.

 

Contributor
Posts: 58
Registered: ‎10-19-2014

Re: Query the hive metastore from impala

Surely if you are accessing impala from python, you should be able to parse the output of "show partitions" etc. programmatically in order to achieve what you want to do?
Announcements