Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Multiple Hive Metastore instances and compactor thread

avatar
New Contributor

Hi,

 

I want to ask a few questions about having multiple Hive Mestastore instances and how to correctly configure the compaction initiator and cleaner threads on these.

 

The point of having multiple Hive Metastore instances is because we have many concurrent connections and, as we have seen on the official documentation (https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/release-guide/topics/cdpdc-hive.html), it is not recommended to have more than 16-24GB of Java Heap due to the impact of Java garbage collection on active processing by the service. 16-24GB of Java Heap is the recommended amount of memory to manage 41-80 concurrent connections (we have around 50 on each Metastore).

 

So, what i want to ask is what is the correct configuration for the compactor initiator and cleaner threads (hive.compactor.initiator.on) if we have multiple Hive Metastore instances. Can it be enabled on every instance? Should it be only enabled on 1 instance?

 

Our environment is running CDH 7.1.7-1 with Hive version 3.1.3000.7.1.7.1000-141.

 

Thanks in advance,

Kind regards.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello Gabriel,

Only 1 metastore should have the compactor enabled

Personally, I would use the last of the metastores listed on hive-site.xml for Hive on TEZ since it normally is the least used

Best.

-JMP

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

Hello Gabriel,

Only 1 metastore should have the compactor enabled

Personally, I would use the last of the metastores listed on hive-site.xml for Hive on TEZ since it normally is the least used

Best.

-JMP

avatar
New Contributor

Hi Jose Manuel,

 

Thank you for your answer. I got confused because i've seen on the Hive Wiki that after Hive version 1.3.0 it may be possible, but i wanted to be sure.

 

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hi...

 

"Before Hive 1.3.0 it's critical that this is enabled on exactly one metastore service instance. As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances."

 

Kind regards.

avatar
Expert Contributor

Gabriel,

We'd still recommend to have it enabled in only one instance 

To avoid race conditions and such

Thank You,

-JMP