Support Questions
Find answers, ask questions, and share your expertise

hive druid mv incremental rebuild not work

Explorer

i have using HDP 3.1 and create materialize view with druid handler, but when i try to rebuild this mv, it takes long time to complete. all fact and dimension tables are ACID tables.

example DDL as below:

 

 

CREATE MATERIALIZED VIEW dwh.druid_fct_txn_sun_mv
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
'druid.segment.granularity'='DAY',
'druid.query.granularity'='MINUTE')
AS select .....

 

 

1. how to verify if this MV is incremental rebuild?

2. how to check writeID on this MV?

 

19 REPLIES 19

Mentor

@anvanna 

Performance

As all fact and dimension tables are ACID tables. You have to understand what compaction is and why they are needed in Hive ACID.

ACID Tables, allow every transaction (INSERT, UPDATE, DELETE) on a bucket create a new files stored inside delta directories so each table partition contains one base directory and several delta directories see below:

[hive@texas.com] hdfs dfs -ls -R /warehouse/tablespace/managed/hive/clients.db/t
drwxr-xr-x - hive hive 0 2019-11-28 17:03 /warehouse/tablespace/managed/hive/clients.db/t/base_0000022
-rw-r--r-- 1 hive hive 602 2019-11-28 17:03 /warehouse/tablespace/managed/hive/clients.db/t/base_0000022/bucket_00000
drwxr-xr-x - hive hive 0 2019-11-28 17:06 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000023_0000023_0000
-rw-r--r-- 1 hive hive 611 2019-11-28 17:06 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000023_0000023_0000/bucket_00000
drwxr-xr-x - hive hive 0 2019-11-28 17:07 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000024_0000024_0000
-rw-r--r-- 1 hive hive 610 2019-11-28 17:07 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000024_0000024_0000/bucket_00000

As you can expect, there are as many delta directories in a partition as transactions made (actually transactions are grouped and committed in batches). This can rapidly become a problem in case of streaming ingestion as Hive will have to read more and more files on every request.

Minor compactions to merge the delta files of one bucket into one delta file;
Major compactions to merge all the delta files of one bucket to the existing base file of this bucket

Compaction might not be triggering that's why the rebuild is so slow?

A good practice is to run compaction

-- Major compaction query example
ALTER TABLE Clients_db.tbl1 PARTITION (col1 = 'value1') COMPACT 'major';

This is how you can ensure the performance of Hive if you encounter the same troubles with compactions

 

To enable Materialized view rewriting you should set this global property to TRUE

 

SET hive.materializedview.rewriting=true;

SET hive.materializedview.rewriting.time.window=10min;  [ Rebuild operation run periodically, e.g., every 10 minutes] 

 

Then you can select which view is enabled for rewriting

ALTER MATERIALIZED VIEW mv ENABLE|DISABLE REWRITE.

Once the MV has been created, it is populated with the data present in the tables. But after data has been modified in the source table, it is not automatically reflected in the MV. Hence, The MV needs to be rebuilt using the command:

ALTER MATERIALIZED VIEW mv REBUILD;

This ensures the optimizer will always attempt an incremental rebuild the MV an incremental rebuild can only be made when new data has been inserted. But In the case of DELETE or UPDATE in the table, a full rebuild will be applied.


To get useful information you could run the below against your MV 


DESCRIBE
FORMATTED mv1;

In the Table parameters, the transient_lastDdlTime should give you an indication of the last incremental rebuild.

 

HTH

 

Explorer

@Shelton 

Thank you very much for a detail solution on compaction Fact Table partition. i have tested and the rebuild mv after compactions and it has change the ways it works before, but i still encountered problem with taking long time on rebuild mv.

 

 

mv_rebuild.JPGmv_rebuild1.JPG

Mentor

@anvanna 

If you are rebuilding the materialized View [MV]  it takes into account all the changes made on the base tables.

 

Do you execute a Hive query to set the location of the Druid broker using an address and port of the broker text listening port?

 

SET hive.druid.broker.address.default=DNS/IP:PORT;

Materialized views (MV)

Materialized views (MV) are more than traditional views, they bring:

  • Storage for intermediate results: as their name suggests, materialized views results are stored, allowing mutualizing computing costs;
  • Incremental rebuilding: updating a materialized view only computes the data that was inserted in its sources tables since the last update;
  • Query rewriting: when appropriate, the cost-based optimizer uses existing materialized views to plan queries instead of the sources tables, without the user being aware of it!

Incremental rebuilds are the least Memory intensive but a good approach is to rebuild if the materialized view uses non-transactional tables you cannot verify whether its contents are outdated, however you can use the automatic rewriting. You can combine a rebuild operation run periodically, e.g. every 5 or 10 minutes, and define the required freshness of the materialized view data using the hive.materializedview.rewriting.time.window configuration parameter, for instance:

 

SET hive.materializedview.rewriting.time.window=10min;

 

What you could do to test is set the above parameter and then update the underlying table then wait for 10 to 11 minutes then query the MV to see if it has been updated, that's more efficient than rebuilding the  MV


Hope that helps, please revert.

Explorer

@Shelton 

Thank you for your prompted respond

Do you execute a Hive query to set the location of the Druid broker using an address and port of the broker text listening port? No because it is already set via ambari.

hive_druid_window.JPG

 

Do this setting SET hive.materializedview.rewriting.time.window=10min; on Advanced hive-interactive-site or Advance hive-site ?

Mentor

@anvanna 

 

Any updates? or do you need more help? 
If you found this answer addressed your initial question, please take a moment to login and click "accept" on the answer. 

 

Happy hadooping !!

Explorer

@Shelton 

yes still need more help from you

Do this setting SET hive.materializedview.rewriting.time.window=10min; on Advanced hive-interactive-site or Advance hive-site ?

Mentor

@anvanna 

 

In order to enable query rewriting using Materialized views, this global property is needed. You should set that global parameter in Custom hive-site

hive.PNG

 Thereafter restart all the stale components Can you also share the values of 

hive.materializedview.rewriting.incremental  in Advanced hive-interactive-site and Advanced hive-site

Next time at MV creation  enable the rewrite at creation time
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE|DISABLE REWRITE;

To test update the source tables and wait after the 10 minutes to see if the values get reflected  in the MV

 

HTH

 

 

 

 

 

 

 

 

Mentor

@anvanna 

Hello there just trying to follow up on this thread, did the response help you resolve the issue or do you still need further help.

 

It would be a good idea to update so you get the help needed.

Explorer

@Shelton 

it seem not help by these configuration because it took me more than 1H  to finish mv rebuild for one day data refresh. other configuration need to customize more?

 

mv_rebuild_log.JPG

Mentor

@anvanna  @jitendra03 

That's obvious and by design, an MV rebuild will take time depending on the changes in the underlying tables the MV is built on even incremental as it has to check each table to compare whether there was a change or not. Something else also you should look at tuning the memory remember behind the scenes it's Map Reduce working and you won't expect it to run in seconds with all the different map and reduces steps.

 

You should have a good understanding of the MV rebuild mechanism below and take that into consideration.

Hive supports incremental view maintenance, i.e., only refresh data that was affected by the changes in the original source tables. Incremental view maintenance will decrease the rebuild step execution time. In addition, it will preserve LLAP cache for existing data in the materialized view.

 

By default, Hive will attempt to rebuild a materialized view incrementally, falling back to full rebuild if it is not possible. The current implementation only supports incremental rebuild when there were INSERT operations over the source tables, while UPDATE and DELETE operations will force a full rebuild of the materialized view.

To execute incremental maintenance, the following conditions should be met:

  • The materialized view should only use transactional tables, either micromanaged or ACID.
  • If the materialized view definition contains a Group By clause, the materialized view should be stored in an ACID table, since it needs to support MERGE operation. For materialized view definitions consisting of Scan-Project-Filter-Join, this restriction does not exist.

A rebuild operation acquires an exclusive write lock over the materialized view, i.e., for a given materialized view, only one rebuild operation can be executed at a given time.

 

I would think you should trigger the rebuild during off-peak hours like when you have no users logged on and running queries in your DW

I would think you should trigger the rebuild during off-peak hours like when you have no users logged on and running queries in your DWH

@jitendra03   I think the initial problem of the MV rebuild failure has been achieved now we are talking of a performance problem, Since performance is a new issue unrelated to the question asked in the original subject, I kindly ask that you start a new question for performance-specific help I would be happy to help.  Asking multiple questions in the same thread makes a thread harder to follow for other users of this community. 


Hope that gives you a better picture of the behind the scenes tasks

 

Happy hadooping

Cloudera Employee

Hello,

 

I was working on Materialized views handled by Druid, and I get the following error, I am using the 3.1 version, 

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.) (state=08S01,code=1)
Aborting command set because "force" is false and command failed: "CREATE MATERIALIZED VIEW ${db_name}.mv_druid
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
SELECT *
FROM ${db_name}.src_1

 

Any updates on this type of error.

HS2 logs says "error committing transaction to druid metadata storage"

Mentor

@jitendra03 

 

Can you share your hivemetastore.log?  out of curiosity can you share the failure or success of the below create MV statement

 

CREATE MATERIALIZED VIEW test_druid_mv
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
SELECT *
FROM db_name.table

 

HTH

 

Cloudera Employee

Hi @Shelton 

 

Heres the error and the logs from HS2. wanted to tell you, I could create an MV when there was no data in the source table, this error is when I am trying to rebuild the MV after loading the data into source table, or creating an MV from a source table with data in it.

CREATE MATERIALIZED VIEW default.druid_mv
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
SELECT *
FROM default.src_1;


error:-

ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
INFO : Completed executing command(queryId=hive_20191210154028_98660f0d-1022-4a0e-bc0c-bab09617d3d0); Time taken: 25.969 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.) (state=08S01,code=1)


Logs: -

2019-12-10T15:40:54,601 INFO [HiveServer2-Background-Pool: Thread-14305]: reexec.ReOptimizePlugin (:()) - ReOptimization: retryPossible: false
2019-12-10T15:40:54,602 ERROR [HiveServer2-Background-Pool: Thread-14305]: ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
2019-12-10T15:40:54,602 INFO [HiveServer2-Background-Pool: Thread-14305]: ql.Driver (:()) - Completed executing command(queryId=hive_20191210154028_98660f0d-1022-4a0e-bc0c-bab09617d3d0); Time taken: 25.969 seconds
2019-12-10T15:40:54,602 INFO [HiveServer2-Background-Pool: Thread-14305]: lockmgr.DbTxnManager (:()) - Stopped heartbeat for query: hive_20191210154028_98660f0d-1022-4a0e-bc0c-bab09617d3d0
2019-12-10T15:40:54,632 ERROR [HiveServer2-Background-Pool: Thread-14305]: operation.Operation (:()) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:348) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1103) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1108) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.createView(DDLTask.java:5065) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:433) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2712) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2383) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2055) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1753) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1747) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
... 13 more
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Transaction failed do to exception being thrown from within the callback. See cause for the original exception.
at org.apache.hadoop.hive.druid.DruidStorageHandler.commitInsertTable(DruidStorageHandler.java:930) ~[hive-druid-handler-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.druid.DruidStorageHandler.commitCreateTable(DruidStorageHandler.java:274) ~[hive-druid-handler-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:917) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:900) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at com.sun.proxy.$Proxy59.createTable(Unknown Source) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3001) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at com.sun.proxy.$Proxy59.createTable(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1092) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1108) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.createView(DDLTask.java:5065) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:433) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2712) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2383) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2055) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1753) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1747) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
... 13 more

 

 

Thank you

Explorer

Do you connect via 

HIVESERVER2 JDBC URL or HIVESERVER2 INTERACTIVE JDBC URL?

Cloudera Employee

Hi @Shelton

I connected via 

Metadata storage connector URL from Druid, which I got from Ambari.

 

Thank you

Mentor

@jitendra03 

Hive Interactive (LLAP) needs to be installed in order to interact with Druid, and this is memory intensive.

There are a couple of configs to check like LLAP sizing and setup information about the Hive LLAP installation this is a very detailed document from Cloudera by@sergey 

In particular, I recommend you read thoroughly Druid and Hive integration  which really documents a walkthrough and benchmarking so a deep dive would give you a very good starting point.
You will need RAM 🙂

 

 

Cloudera Employee

@Shelton 

 

I did try on LLAP and I had the following error, the thing is anything that's backed/ handled  by druid, I get the same following error.

 

Error: java.io.IOException: org.apache.hive.druid.io.druid.java.util.common.RE: Failure getting results for query[TimeseriesQuery{dataSource='default.druid_test', querySegmentSpec=LegacySegmentSpec{intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]}, descending=false, virtualColumns=[], dimFilter=null, granularity='AllGranularity', aggregatorSpecs=[CountAggregatorFactory{name='$f0'}], postAggregatorSpecs=[], context={queryId=hive_20191211113348_a1f4d6da-4559-46bc-8ba6-d4a6dc250cea, skipEmptyBuckets=false}}] from locations[[Ljava.lang.String;@4d855cea] because of [java.util.concurrent.ExecutionException: org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Channel disconnected] (state=,code=0)

Explorer

@jitendra03 

Could u share  hiveserver2Interactive.log?  Did ur druid historical server installed?

Mentor

@jitendra03 

Hi,

it's always a good idea to start a new thread and maybe tag me because the response in this thread are possible solutions to question asked by @anvanna  which could be different from the issue you are facing.

Please start a new thread tag me and remember to attach any logs screenshots and the versions 

Happy hadooping

 

; ;