Support Questions

Find answers, ask questions, and share your expertise

puthivestreaming metastore load balancing

We have multiple instances of hive metastore server and since we directly give metastore uri in the puthivestreaming processor which is like thrift://host1:port1,thrift://host2:port2, does it mean that only when one instance is down, the other takes over (as a failover or high availability) or is the load shared between these two instances all the time?


Super Guru

If having multiple URIs in the Hive Metastore URI property works, then it provides High Availability. It is akin to setting the same values in the "hive.metastore.uris" property of a hive-site.xml file, for example.

Thanks for that reply, Matt. Here is the scenario...We have 20 nifi flows each using a puthivestreaming processor expecting to handle around 25k flow files per minute per flow (after using a merge before a puthivestreaming processor). At this maximum threshold, we are seeing puthivestreaming processor queuing up lot of flowfiles and the ingestion to the target tables is slower. Is it because of the peak load on hive metastore? If so, how can i minimize the load on metastore that is caused by puthivestreaming? Currently we have two metastore hosts.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.