Created on 02-19-2020 08:46 PM - last edited on 02-19-2020 10:38 PM by VidyaSargur
I have a huge analyst base querying the hive tables for their reporting needs.
How to enforce a user to query a hive table using a partition clause
Created 02-19-2020 09:56 PM
you can set hive.mapred.mode = strict; Quoting from doc: https://blog.cloudera.com/improving-query-performance-using-partitioning-in-apache-hive/
If your partitioned table is very large, you could block any full table scan queries by putting Hive into strict mode using the set hive.mapred.mode=strict command. In this mode, when users submit a query that would result in a full table scan (i.e. queries without any partitioned columns) an error is issued.
Created 02-27-2020 09:41 PM
Is there a way to enable
hive.mapred.mode=strict
on a particular queue?
Created 02-27-2020 09:52 PM
This property is hive specific so it can be either enabled on hiveserver2 server level or in the client session level but is not possible to set on queue level (coz scheduler dont understand hive specific properties). If you set this property in hiveserver2 level - then all hive client will be mandated to use where clause when user run query on partitioned table.
Other property which you can use is hive.metastore.limit.partition.request
https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/admin_hive_tuning.html
Quoting from link
Set the hive.metastore.limit.partition.request parameter to 1000 to limit the maximum number of partitions accessed from a single table in a query. See the Apache wiki for information about setting this parameter. If this parameter is set, queries that access more than 1000 partitions fail with the following error:
MetaException: Number of partitions scanned (=%d) on table '%s' exceeds limit (=%d)
Setting this parameter protects against bad workloads and identifies queries that need to be optimized. To resolve the failed queries: