- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to enforce a user to query a hive table using a partition clause
- Labels:
-
Apache Hive
Created on
‎02-19-2020
08:46 PM
- last edited on
‎02-19-2020
10:38 PM
by
VidyaSargur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a huge analyst base querying the hive tables for their reporting needs.
How to enforce a user to query a hive table using a partition clause
Created ‎02-19-2020 09:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can set hive.mapred.mode = strict; Quoting from doc: https://blog.cloudera.com/improving-query-performance-using-partitioning-in-apache-hive/
If your partitioned table is very large, you could block any full table scan queries by putting Hive into strict mode using the set hive.mapred.mode=strict command. In this mode, when users submit a query that would result in a full table scan (i.e. queries without any partitioned columns) an error is issued.
Created ‎02-27-2020 09:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to enable
hive.mapred.mode=strict
on a particular queue?
Created ‎02-27-2020 09:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This property is hive specific so it can be either enabled on hiveserver2 server level or in the client session level but is not possible to set on queue level (coz scheduler dont understand hive specific properties). If you set this property in hiveserver2 level - then all hive client will be mandated to use where clause when user run query on partitioned table.
Other property which you can use is hive.metastore.limit.partition.request
https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/admin_hive_tuning.html
Quoting from link
Set the hive.metastore.limit.partition.request parameter to 1000 to limit the maximum number of partitions accessed from a single table in a query. See the Apache wiki for information about setting this parameter. If this parameter is set, queries that access more than 1000 partitions fail with the following error:
MetaException: Number of partitions scanned (=%d) on table '%s' exceeds limit (=%d)
Setting this parameter protects against bad workloads and identifies queries that need to be optimized. To resolve the failed queries:
- Apply the appropriate partition filters.
- Override the limit on a per-query basis.
- Increase the cluster-wide limit beyond 1000, if needed, but note that this adds memory pressure to HiveServer2 and the Hive metastore.
