Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hive druid mv incremental rebuild not work

hive druid mv incremental rebuild not work

Explorer

i have using HDP 3.1 and create materialize view with druid handler, but when i try to rebuild this mv, it takes long time to complete. all fact and dimension tables are ACID tables.

example DDL as below:

 

 

CREATE MATERIALIZED VIEW dwh.druid_fct_txn_sun_mv
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
'druid.segment.granularity'='DAY',
'druid.query.granularity'='MINUTE')
AS select .....

 

 

1. how to verify if this MV is incremental rebuild?

2. how to check writeID on this MV?

 

19 REPLIES 19
Highlighted

Re: hive druid mv incremental rebuild not work

Mentor

@anvanna 

Performance

As all fact and dimension tables are ACID tables. You have to understand what compaction is and why they are needed in Hive ACID.

ACID Tables, allow every transaction (INSERT, UPDATE, DELETE) on a bucket create a new files stored inside delta directories so each table partition contains one base directory and several delta directories see below:

[hive@texas.com] hdfs dfs -ls -R /warehouse/tablespace/managed/hive/clients.db/t
drwxr-xr-x - hive hive 0 2019-11-28 17:03 /warehouse/tablespace/managed/hive/clients.db/t/base_0000022
-rw-r--r-- 1 hive hive 602 2019-11-28 17:03 /warehouse/tablespace/managed/hive/clients.db/t/base_0000022/bucket_00000
drwxr-xr-x - hive hive 0 2019-11-28 17:06 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000023_0000023_0000
-rw-r--r-- 1 hive hive 611 2019-11-28 17:06 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000023_0000023_0000/bucket_00000
drwxr-xr-x - hive hive 0 2019-11-28 17:07 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000024_0000024_0000
-rw-r--r-- 1 hive hive 610 2019-11-28 17:07 /warehouse/tablespace/managed/hive/clients.db/t/delta_0000024_0000024_0000/bucket_00000

As you can expect, there are as many delta directories in a partition as transactions made (actually transactions are grouped and committed in batches). This can rapidly become a problem in case of streaming ingestion as Hive will have to read more and more files on every request.

Minor compactions to merge the delta files of one bucket into one delta file;
Major compactions to merge all the delta files of one bucket to the existing base file of this bucket

Compaction might not be triggering that's why the rebuild is so slow?

A good practice is to run compaction

-- Major compaction query example
ALTER TABLE Clients_db.tbl1 PARTITION (col1 = 'value1') COMPACT 'major';

This is how you can ensure the performance of Hive if you encounter the same troubles with compactions

 

To enable Materialized view rewriting you should set this global property to TRUE

 

SET hive.materializedview.rewriting=true;

SET hive.materializedview.rewriting.time.window=10min;  [ Rebuild operation run periodically, e.g., every 10 minutes] 

 

Then you can select which view is enabled for rewriting

ALTER MATERIALIZED VIEW mv ENABLE|DISABLE REWRITE.

Once the MV has been created, it is populated with the data present in the tables. But after data has been modified in the source table, it is not automatically reflected in the MV. Hence, The MV needs to be rebuilt using the command:

ALTER MATERIALIZED VIEW mv REBUILD;

This ensures the optimizer will always attempt an incremental rebuild the MV an incremental rebuild can only be made when new data has been inserted. But In the case of DELETE or UPDATE in the table, a full rebuild will be applied.


To get useful information you could run the below against your MV 


DESCRIBE
FORMATTED mv1;

In the Table parameters, the transient_lastDdlTime should give you an indication of the last incremental rebuild.

 

HTH

 

Highlighted

Re: hive druid mv incremental rebuild not work

Explorer

@Shelton 

Thank you very much for a detail solution on compaction Fact Table partition. i have tested and the rebuild mv after compactions and it has change the ways it works before, but i still encountered problem with taking long time on rebuild mv.

 

 

mv_rebuild.JPGmv_rebuild1.JPG

Highlighted

Re: hive druid mv incremental rebuild not work

Mentor

@anvanna 

If you are rebuilding the materialized View [MV]  it takes into account all the changes made on the base tables.

 

Do you execute a Hive query to set the location of the Druid broker using an address and port of the broker text listening port?

 

SET hive.druid.broker.address.default=DNS/IP:PORT;

Materialized views (MV)

Materialized views (MV) are more than traditional views, they bring:

  • Storage for intermediate results: as their name suggests, materialized views results are stored, allowing mutualizing computing costs;
  • Incremental rebuilding: updating a materialized view only computes the data that was inserted in its sources tables since the last update;
  • Query rewriting: when appropriate, the cost-based optimizer uses existing materialized views to plan queries instead of the sources tables, without the user being aware of it!

Incremental rebuilds are the least Memory intensive but a good approach is to rebuild if the materialized view uses non-transactional tables you cannot verify whether its contents are outdated, however you can use the automatic rewriting. You can combine a rebuild operation run periodically, e.g. every 5 or 10 minutes, and define the required freshness of the materialized view data using the hive.materializedview.rewriting.time.window configuration parameter, for instance:

 

SET hive.materializedview.rewriting.time.window=10min;

 

What you could do to test is set the above parameter and then update the underlying table then wait for 10 to 11 minutes then query the MV to see if it has been updated, that's more efficient than rebuilding the  MV


Hope that helps, please revert.

Highlighted

Re: hive druid mv incremental rebuild not work

Explorer

@Shelton 

Thank you for your prompted respond

Do you execute a Hive query to set the location of the Druid broker using an address and port of the broker text listening port? No because it is already set via ambari.

hive_druid_window.JPG

 

Do this setting SET hive.materializedview.rewriting.time.window=10min; on Advanced hive-interactive-site or Advance hive-site ?

Highlighted

Re: hive druid mv incremental rebuild not work

Mentor

@anvanna 

 

Any updates? or do you need more help? 
If you found this answer addressed your initial question, please take a moment to login and click "accept" on the answer. 

 

Happy hadooping !!

Highlighted

Re: hive druid mv incremental rebuild not work

Explorer

@Shelton 

yes still need more help from you

Do this setting SET hive.materializedview.rewriting.time.window=10min; on Advanced hive-interactive-site or Advance hive-site ?

Highlighted

Re: hive druid mv incremental rebuild not work

Mentor

@anvanna 

 

In order to enable query rewriting using Materialized views, this global property is needed. You should set that global parameter in Custom hive-site

hive.PNG

 Thereafter restart all the stale components Can you also share the values of 

hive.materializedview.rewriting.incremental  in Advanced hive-interactive-site and Advanced hive-site

Next time at MV creation  enable the rewrite at creation time
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE|DISABLE REWRITE;

To test update the source tables and wait after the 10 minutes to see if the values get reflected  in the MV

 

HTH

 

 

 

 

 

 

 

 

Highlighted

Re: hive druid mv incremental rebuild not work

Mentor

@anvanna 

Hello there just trying to follow up on this thread, did the response help you resolve the issue or do you still need further help.

 

It would be a good idea to update so you get the help needed.

Highlighted

Re: hive druid mv incremental rebuild not work

Explorer

@Shelton 

it seem not help by these configuration because it took me more than 1H  to finish mv rebuild for one day data refresh. other configuration need to customize more?

 

mv_rebuild_log.JPG

Don't have an account?
Coming from Hortonworks? Activate your account here