Member since
02-07-2019
2719
Posts
238
Kudos Received
31
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1127 | 08-21-2025 10:43 PM | |
| 1878 | 04-15-2025 10:34 PM | |
| 4850 | 10-28-2024 12:37 AM | |
| 1947 | 09-04-2024 07:38 AM | |
| 3776 | 06-10-2024 10:24 PM |
10-21-2025
01:16 AM
@Meepoljd, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
... View more
09-08-2025
08:32 AM
@Jack_sparrow, Welcome to our community! To help you get the best possible answer, I have tagged in our Spark experts @haridjh and @vafs, who may be able to assist you further. Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
... View more
09-07-2025
11:04 PM
@lbourgeois, Did the response assist in resolving your query? If it did, please mark the relevant reply as the solution, as it will help others locate the answer more easily in the future.
... View more
09-07-2025
10:49 PM
@AEAT, Did the response assist in resolving your query? If it did, please mark the relevant reply as the solution, as it will help others locate the answer more easily in the future.
... View more
09-07-2025
10:48 PM
@ishashrestha Did the response assist in resolving your query? If it did, please mark the relevant reply as the solution, as it will help others locate the answer more easily in the future.
... View more
09-02-2025
02:50 AM
In the RDBMS world, query history has existed for many years. When I worked with MS SQL Server, I could easily check who executed which query on the server. In MS SQL, this feature is called Query Store. In Apache Hive and Apache Impala, the equivalent is called Query History.
This feature is primarily used to simplify performance troubleshooting. With SQL queries, you can compare execution plans, check statistics, and analyze many other aspects. It gives you the ability to understand how your data warehouse is being utilized, using data from the data warehouse itself. You can even create your own reports and dashboards on top of this data. It does not stop here! You also gain insights into your query plans, compare past and present performance, and create trends based on historical data.
One of the main motivations for introducing this feature in Apache Hive and Apache Impala was customer escalations in the past. There were many cases where customers complained about performance degradation after a data warehouse upgrade. Sometimes these issues were real, but in many cases, it was only a perception change without actual performance regression. The truth is always in the data. From now on, when customers use Hive or Impala, they can capture query usage and gain insight into cluster and query performance. Often, schema changes or outdated statistics affect performance. With Query History enabled, you can clearly see these changes.
The query history is stored in an Iceberg table. When enabled, this feature introduces some storage requirements and has a slight impact on performance. Every executed query is logged along with its statistics and performance metrics in a large Iceberg table.
Let's See How to Use It
These configs apply to Apache Hive.
Apache Impala uses a different configuration and slightly different strategies, but in this article, we will focus on Apache Hive Query History.
This is by default query history is disabled in Hive. This can be enable with following config:
hive.query.history.enabled
To check the current setting, run this command:
SET hive.query.history.enabled;
Result:
+-----------------------------------+ | set | +-----------------------------------+ | hive.query.history.enabled=false | +-----------------------------------+
To enable it, simply set:
SET hive.query.history.enabled = true;
If you also want to store the detailed query plan, enable:
hive.query.history.explain.plan.enabled
Batch size:
The implementation is designed to be efficient. It does not insert rows one by one. Instead, it uses a buffer and then flushes data periodically (by default, every hour) into the sys.query_history table. You can control this behavior using the following settings:
hive.query.history.batch.size
This defines the maximum number of executed queries to keep in memory before persisting them to the sys table. A small value (5-10) gives near real-time behavior, but it can create many small files. In that case, administrators need to compact the sys table for optimal performance.
For demo purposes, you can set the batch size to 0, which forces synchronous writes. This is not recommended for production.
The default value is 200.
Memory limit
There is another config where you can play with persist part.
hive.query.history.max.memory.bytes
This controls the maximum buffer size in bytes before flushing to the sys table.
Setting it to 0 disables the limit, which can be useful for demos but should not be used in production.
Flush Interval
hive.query.history.flush.interval.seconds
This defines the maximum time interval to flush records from memory to the Iceberg table, even if the batch size is not reached. The default value is 3600 seconds (1 hour), which balances file size and timely availability of data.
Setting it to 0 disables the interval-based flush and forces batch-size-based flushing only.
From an implementation perspective, this service runs inside HiveServer2, because all queries go through HS2. Queries are stored only after completion, regardless of success or failure. Records first go to the buffer and then to the sys table based on the settings. By default, buffering is enabled.
If you are using the hive docker image, you will need to add the following environment configuration to enable the Query History service inside the standalone container. (quick start)
--env SERVICE_OPTS="-Dhive.query.history.enabled=true"
What Data Is Stored?
Well… A lot! The table is wide and includes both basic information and performance/runtime metrics. Examples:
Any many many more!!!
The sys.query_history table is partitioned by cluster_id!
For the full list, check this Apache Hive upstream JIRA attachment.
Hive does not provide a built-in UI or dashboard for Query History. Some RDBMS tools do, but Hive only exposes raw data. You can build any visualization or reporting layer on top of it. Currently, there are no built-in recommendations or AI-driven insights, but this would be a valuable community contribution in the future. But!
The author (abstractdog — Laszlo Bodor) of this feature created some useful SQL that could help for the daily job.
Useful SQL Examples for sys table
Basic information about a specific query
select sql, tez_dag_id, plan, exec_summary from sys.query_history where query_id = 'YOUR_QUERYID';
Top five longest-running queries
select end_user, query_id, total_time_ms from sys.query_history order by total_time_ms desc limit 5;
Number of concurrent queries
SELECT a.query_id, COUNT(b.query_id) AS concurrent_queries FROM sys.query_history a LEFT JOIN sys.query_history b ON a.query_id != b.query_id AND a.start_time <= b.end_time AND b.start_time <= a.end_time GROUP BY a.query_id ORDER BY concurrent_queries DESC LIMIT 5;
These are just starting points. With Query History, you can analyze your cluster using SQL you already know, and you can dive deep into performance trends and usage patterns. When performance problems occur, you will have the data to trace when and how they started.
Query history in Apache Hive was originally published in Engineering@Cloudera on Medium, where people are continuing the conversation by highlighting and responding to this story.
... View more
Labels:
08-26-2025
12:11 AM
@yoonli, Welcome to our community! To help you get the best possible answer, I have tagged our NiFi experts, @MattWho and @vafs, who may be able to assist you further. Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
... View more
08-21-2025
10:45 PM
@Pratibha123, Did the response assist in resolving your query? If it did, please mark the relevant reply as the solution, as it will help others locate the answer more easily in the future.
... View more
08-21-2025
10:43 PM
@tableau, Did the response assist in resolving your query? If it did, please mark the relevant reply as the solution, as it will help others locate the answer more easily in the future.
... View more
08-21-2025
10:31 PM
@Amry, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts @MattWho and @SAMSAL, who may be able to assist you further. Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
... View more