Created on 12-02-2019 06:57 PM - edited 09-16-2022 08:53 AM
i have using HDP 3.1 and create materialize view with druid handler, but when i try to rebuild this mv, it takes long time to complete. all fact and dimension tables are ACID tables.
example DDL as below:
CREATE MATERIALIZED VIEW dwh.druid_fct_txn_sun_mv
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
'druid.segment.granularity'='DAY',
'druid.query.granularity'='MINUTE')
AS select .....
1. how to verify if this MV is incremental rebuild?
2. how to check writeID on this MV?
Created 12-03-2019 03:50 AM
Performance
As all fact and dimension tables are ACID tables. You have to understand what compaction is and why they are needed in Hive ACID.
ACID Tables, allow every transaction (INSERT, UPDATE, DELETE) on a bucket create a new files stored inside delta directories so each table partition contains one base directory and several delta directories see below:
[hive@texas.com] hdfs dfs -ls -R /warehouse/tablespace/managed/hive/clients.db/t
drwxr-xr-x - hive hive 0 2019-11-28 17:03 /warehouse/tablespace/managed/hive/clients.db/t/base_0000022
-rw-r--r-- 1 hive hive 602 2019-11-28 17:03 /warehouse/tablespace/managed/hive/clients.db/t/base_0000022/bucket_00000
drwxr-xr-x - hive hive 0 2019-11-28 17:06 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000023_0000023_0000
-rw-r--r-- 1 hive hive 611 2019-11-28 17:06 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000023_0000023_0000/bucket_00000
drwxr-xr-x - hive hive 0 2019-11-28 17:07 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000024_0000024_0000
-rw-r--r-- 1 hive hive 610 2019-11-28 17:07 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000024_0000024_0000/bucket_00000
As you can expect, there are as many delta directories in a partition as transactions made (actually transactions are grouped and committed in batches). This can rapidly become a problem in case of streaming ingestion as Hive will have to read more and more files on every request.
Minor compactions to merge the delta files of one bucket into one delta file;
Major compactions to merge all the delta files of one bucket to the existing base file of this bucket
Compaction might not be triggering that's why the rebuild is so slow?
A good practice is to run compaction
-- Major compaction query example
ALTER TABLE Clients_db.tbl1 PARTITION (col1 = 'value1') COMPACT 'major';
This is how you can ensure the performance of Hive if you encounter the same troubles with compactions
To enable Materialized view rewriting you should set this global property to TRUE
SET hive.materializedview.rewriting=true;
SET hive.materializedview.rewriting.time.window=10min; [ Rebuild operation run periodically, e.g., every 10 minutes]
Then you can select which view is enabled for rewriting
ALTER MATERIALIZED VIEW mv ENABLE|DISABLE REWRITE.
Once the MV has been created, it is populated with the data present in the tables. But after data has been modified in the source table, it is not automatically reflected in the MV. Hence, The MV needs to be rebuilt using the command:
ALTER MATERIALIZED VIEW mv REBUILD;
This ensures the optimizer will always attempt an incremental rebuild the MV an incremental rebuild can only be made when new data has been inserted. But In the case of DELETE or UPDATE in the table, a full rebuild will be applied.
To get useful information you could run the below against your MV
DESCRIBE FORMATTED mv1;
In the Table parameters, the transient_lastDdlTime should give you an indication of the last incremental rebuild.
HTH
Created 12-13-2019 01:39 AM
Thank you very much for a detail solution on compaction Fact Table partition. i have tested and the rebuild mv after compactions and it has change the ways it works before, but i still encountered problem with taking long time on rebuild mv.
Created 12-13-2019 08:31 AM
If you are rebuilding the materialized View [MV] it takes into account all the changes made on the base tables.
Do you execute a Hive query to set the location of the Druid broker using an address and port of the broker text listening port?
SET hive.druid.broker.address.default=DNS/IP:PORT;
Materialized views (MV) are more than traditional views, they bring:
Incremental rebuilds are the least Memory intensive but a good approach is to rebuild if the materialized view uses non-transactional tables you cannot verify whether its contents are outdated, however you can use the automatic rewriting. You can combine a rebuild operation run periodically, e.g. every 5 or 10 minutes, and define the required freshness of the materialized view data using the hive.materializedview.rewriting.time.window configuration parameter, for instance:
SET hive.materializedview.rewriting.time.window=10min;
What you could do to test is set the above parameter and then update the underlying table then wait for 10 to 11 minutes then query the MV to see if it has been updated, that's more efficient than rebuilding the MV
Hope that helps, please revert.
Created 12-15-2019 09:01 PM
Thank you for your prompted respond
Do you execute a Hive query to set the location of the Druid broker using an address and port of the broker text listening port? No because it is already set via ambari.
Do this setting SET hive.materializedview.rewriting.time.window=10min; on Advanced hive-interactive-site or Advance hive-site ?
Created 12-17-2019 08:57 AM
Any updates? or do you need more help?
If you found this answer addressed your initial question, please take a moment to login and click "accept" on the answer.
Happy hadooping !!
Created 12-17-2019 08:30 PM
yes still need more help from you
Do this setting SET hive.materializedview.rewriting.time.window=10min; on Advanced hive-interactive-site or Advance hive-site ?
Created 12-19-2019 01:42 PM
In order to enable query rewriting using Materialized views, this global property is needed. You should set that global parameter in Custom hive-site
Thereafter restart all the stale components Can you also share the values of
hive.materializedview.rewriting.incremental in Advanced hive-interactive-site and Advanced hive-site
Next time at MV creation enable the rewrite at creation time
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE|DISABLE REWRITE;
To test update the source tables and wait after the 10 minutes to see if the values get reflected in the MV
HTH
Created 01-05-2020 01:37 AM
Hello there just trying to follow up on this thread, did the response help you resolve the issue or do you still need further help.
It would be a good idea to update so you get the help needed.
Created 01-05-2020 07:18 PM
it seem not help by these configuration because it took me more than 1H to finish mv rebuild for one day data refresh. other configuration need to customize more?
Created 01-06-2020 01:31 AM
That's obvious and by design, an MV rebuild will take time depending on the changes in the underlying tables the MV is built on even incremental as it has to check each table to compare whether there was a change or not. Something else also you should look at tuning the memory remember behind the scenes it's Map Reduce working and you won't expect it to run in seconds with all the different map and reduces steps.
You should have a good understanding of the MV rebuild mechanism below and take that into consideration.
Hive supports incremental view maintenance, i.e., only refresh data that was affected by the changes in the original source tables. Incremental view maintenance will decrease the rebuild step execution time. In addition, it will preserve LLAP cache for existing data in the materialized view.
By default, Hive will attempt to rebuild a materialized view incrementally, falling back to full rebuild if it is not possible. The current implementation only supports incremental rebuild when there were INSERT operations over the source tables, while UPDATE and DELETE operations will force a full rebuild of the materialized view.
To execute incremental maintenance, the following conditions should be met:
A rebuild operation acquires an exclusive write lock over the materialized view, i.e., for a given materialized view, only one rebuild operation can be executed at a given time.
I would think you should trigger the rebuild during off-peak hours like when you have no users logged on and running queries in your DW
I would think you should trigger the rebuild during off-peak hours like when you have no users logged on and running queries in your DWH
@jitendra03 I think the initial problem of the MV rebuild failure has been achieved now we are talking of a performance problem, Since performance is a new issue unrelated to the question asked in the original subject, I kindly ask that you start a new question for performance-specific help I would be happy to help. Asking multiple questions in the same thread makes a thread harder to follow for other users of this community.
Hope that gives you a better picture of the behind the scenes tasks
Happy hadooping
Created 12-10-2019 11:48 AM
Hello,
I was working on Materialized views handled by Druid, and I get the following error, I am using the 3.1 version,
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.) (state=08S01,code=1)
Aborting command set because "force" is false and command failed: "CREATE MATERIALIZED VIEW ${db_name}.mv_druid
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
SELECT *
FROM ${db_name}.src_1
Any updates on this type of error.
HS2 logs says "error committing transaction to druid metadata storage"
Created 12-10-2019 02:10 PM
Can you share your hivemetastore.log? out of curiosity can you share the failure or success of the below create MV statement
CREATE MATERIALIZED VIEW test_druid_mv
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
SELECT *
FROM db_name.table
HTH
Created 12-10-2019 03:54 PM
Hi @Shelton
Heres the error and the logs from HS2. wanted to tell you, I could create an MV when there was no data in the source table, this error is when I am trying to rebuild the MV after loading the data into source table, or creating an MV from a source table with data in it.
CREATE MATERIALIZED VIEW default.druid_mv
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
SELECT *
FROM default.src_1;
error:-
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
INFO : Completed executing command(queryId=hive_20191210154028_98660f0d-1022-4a0e-bc0c-bab09617d3d0); Time taken: 25.969 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.) (state=08S01,code=1)
Logs: -
2019-12-10T15:40:54,601 INFO [HiveServer2-Background-Pool: Thread-14305]: reexec.ReOptimizePlugin (:()) - ReOptimization: retryPossible: false
2019-12-10T15:40:54,602 ERROR [HiveServer2-Background-Pool: Thread-14305]: ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
2019-12-10T15:40:54,602 INFO [HiveServer2-Background-Pool: Thread-14305]: ql.Driver (:()) - Completed executing command(queryId=hive_20191210154028_98660f0d-1022-4a0e-bc0c-bab09617d3d0); Time taken: 25.969 seconds
2019-12-10T15:40:54,602 INFO [HiveServer2-Background-Pool: Thread-14305]: lockmgr.DbTxnManager (:()) - Stopped heartbeat for query: hive_20191210154028_98660f0d-1022-4a0e-bc0c-bab09617d3d0
2019-12-10T15:40:54,632 ERROR [HiveServer2-Background-Pool: Thread-14305]: operation.Operation (:()) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:348) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1103) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1108) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.createView(DDLTask.java:5065) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:433) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2712) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2383) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2055) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1753) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1747) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
... 13 more
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Transaction failed do to exception being thrown from within the callback. See cause for the original exception.
at org.apache.hadoop.hive.druid.DruidStorageHandler.commitInsertTable(DruidStorageHandler.java:930) ~[hive-druid-handler-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.druid.DruidStorageHandler.commitCreateTable(DruidStorageHandler.java:274) ~[hive-druid-handler-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:917) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:900) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at com.sun.proxy.$Proxy59.createTable(Unknown Source) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3001) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at com.sun.proxy.$Proxy59.createTable(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1092) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1108) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.createView(DDLTask.java:5065) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:433) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2712) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2383) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2055) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1753) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1747) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
... 13 more
Thank you
Created 12-10-2019 06:59 PM
Do you connect via
- HIVESERVER2 JDBC URL or HIVESERVER2 INTERACTIVE JDBC URL?
Created on 12-11-2019 09:52 AM - edited 12-11-2019 09:52 AM
Hi @Shelton,
I connected via
Metadata storage connector URL from Druid, which I got from Ambari.
Thank you
Created 12-11-2019 11:38 AM
Hive Interactive (LLAP) needs to be installed in order to interact with Druid, and this is memory intensive.
There are a couple of configs to check like LLAP sizing and setup information about the Hive LLAP installation this is a very detailed document from Cloudera by@sergey
In particular, I recommend you read thoroughly Druid and Hive integration which really documents a walkthrough and benchmarking so a deep dive would give you a very good starting point.
You will need RAM 🙂
Created 12-11-2019 11:46 AM
I did try on LLAP and I had the following error, the thing is anything that's backed/ handled by druid, I get the same following error.
Error: java.io.IOException: org.apache.hive.druid.io.druid.java.util.common.RE: Failure getting results for query[TimeseriesQuery{dataSource='default.druid_test', querySegmentSpec=LegacySegmentSpec{intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]}, descending=false, virtualColumns=[], dimFilter=null, granularity='AllGranularity', aggregatorSpecs=[CountAggregatorFactory{name='$f0'}], postAggregatorSpecs=[], context={queryId=hive_20191211113348_a1f4d6da-4559-46bc-8ba6-d4a6dc250cea, skipEmptyBuckets=false}}] from locations[[Ljava.lang.String;@4d855cea] because of [java.util.concurrent.ExecutionException: org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Channel disconnected] (state=,code=0)
Created 12-13-2019 01:48 AM
Could u share hiveserver2Interactive.log? Did ur druid historical server installed?
Created 12-17-2019 09:03 AM
Hi,
it's always a good idea to start a new thread and maybe tag me because the response in this thread are possible solutions to question asked by @anvanna which could be different from the issue you are facing.
Please start a new thread tag me and remember to attach any logs screenshots and the versions
Happy hadooping