Support Questions

Find answers, ask questions, and share your expertise

How to enforce a user to query a hive table using a partition clause

avatar
New Contributor

I have a huge analyst base querying the hive tables for their reporting needs.

How to enforce a user to query a hive table using a partition clause

3 REPLIES 3

avatar

you can set hive.mapred.mode = strict; Quoting from doc: https://blog.cloudera.com/improving-query-performance-using-partitioning-in-apache-hive/

If your partitioned table is very large, you could block any full table scan queries by putting Hive into strict mode using the set hive.mapred.mode=strict command. In this mode, when users submit a query that would result in a full table scan (i.e. queries without any partitioned columns) an error is issued.

 

avatar
New Contributor

Is there a way to enable 

hive.mapred.mode=strict

 on a particular queue?

avatar

This property is hive specific so it can be either enabled on hiveserver2 server level or in the client session level but is not possible to set on queue level (coz scheduler dont understand hive specific properties). If you set this property in hiveserver2 level - then all hive client will be mandated to use where clause when user run query on partitioned table.

 

Other property which you can use is hive.metastore.limit.partition.request

 https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/admin_hive_tuning.html

Quoting from link

Set the hive.metastore.limit.partition.request parameter to 1000 to limit the maximum number of partitions accessed from a single table in a query. See the Apache wiki for information about setting this parameter. If this parameter is set, queries that access more than 1000 partitions fail with the following error:

MetaException: Number of partitions scanned (=%d) on table '%s' exceeds limit (=%d)

Setting this parameter protects against bad workloads and identifies queries that need to be optimized. To resolve the failed queries:

  • Apply the appropriate partition filters.
  • Override the limit on a per-query basis.
  • Increase the cluster-wide limit beyond 1000, if needed, but note that this adds memory pressure to HiveServer2 and the Hive metastore.