Support Questions
Find answers, ask questions, and share your expertise

Can I configure Cloudera Impala to use Hortonworks Hive metastore


Hello All,

     I have a HDP cluster with HIVE services running and i'm planning to set up a separate CDH cluster with Impala service. Can I configure Impala to use the HIVE metastore in HDP cluster.




Hi @pauljoshiva,

In theory it should be possible, however CDH and HDP releases are not tested together and shipped with different Hive Metastore versions, the unity release will be CDP.

I can see 2 possible approaches, please note that I have not tried these and there might be skeletons in the closet:

  1. Using the CDH HMS binaries to connect to the central HMS backend database. The main problem could be the HMS schema which can differ in releases, especially between major releases, for example HDP 3.x is shipped with Hive 3, HDP 2.6.x is shipped with Hive 2, while CDH 6.x is packaged with a patched Hive 2, although some Hive 3 fixes can be available in CDH 6 as well. The metastore schema compatibility between releases can be verified with the Metastore Schema tool, this could rule out the feasibility of this option fast. Also, DBTokenStore should be enabled for both HMS.
  2. Pointing Impala to use the HDP HMS. There might be API differences between the HMS binaries that could cause unexpected Impala behavior. This can be mitigated by picking versions as close as possible, however due to the nature of the CDH Hive release, as it is patched with newer fixes, there could still be differences.

Additionally, would recommend creating a backup of the databases that can be affected and contain important metadata.


@tmater Thanks for your reply.

So I have HDP 3.1.0 with HIVE 3.0.0 installed, what CDH version would be compatible to this?


The current newest CDH release 6.3.2 has a patched Hive 2.1.1. With a major release difference I believe there will be both HMS schema difference and HMS API difference as well.

Depending on the use-case, during the POC period, the data/metadata could be migrated to the CDH cluster and work on the performance. Later, when Impala is well-tried a workflow could be built where the clusters are working on the tasks that are the most suitable for specific components.

; ;