Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cluster Mysql metadata Database size is increasing, what is the best solution?

avatar
Explorer

Hi ,


We are using mysql for our cluster database. Currently we are seeing size of


-rw-rw----. 1 mysql mysql 37G May 1 18:55 ibdata1 

This database has - ambaridb/rangerdb/hivedb etc.

is increasing day by day. and we don't enough space on the disk.. Is there any easy way where we can reduce the size of this mysql db? Or is there a way if we can move this to some other disk? or increase the size of this mount point?


Please need suggestion

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Madhura Mhatre

From the output, you can clearly see that its the Hive database that has grown. Having said that @Jay Kumar SenSharma's solution would work if it was the Ambari database that was huge where you can purge the history which is not the case. You cannot purge the hive database as you will lose data but you can create a more compact table with CTAS see below. The end result you would have manageable table size on disk


Option 1

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
create table if not exists t1 (a int, b int) partitioned by (c int);       -- your original table
create table t1orc (a int, b int) partitioned by (c int) stored as ORC;    -- your compressed table
insert into table t1orc partition(c) select a, b, c from t1;

CTAS has these restrictions:

The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.

Option 2

The other solution is to change the location and increase the size of the mount point be aware to maintain the same path as the metadata maintains that record in the Hive metastore

So you will need to follow update the location as documented here

Hope that helps

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@Madhura Mhatre

Can you please run the following query inside your MySQL DB so that we can findout which database ( ambaridb/rangerdb/hivedb) is actually utilizing more memory? Please share the output.

mysql> SELECT table_schema "DB Name", ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB" FROM information_schema.tables GROUP BY table_schema; 

.

then accordingly we can check if there is a possibility to clear some unwanted data. Like in case of ambaridb it may be the old unwanted alerts and operational data that might be consuming most of the db space. In such cases Ambari Provides a utility to purge the historical unwanted data:

# ambari-server db-purge-history --cluster-name YOUR_CLUSTER_NAME   --from-date 2018-08-01


https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.3.0/administering-ambari/content/amb_purge_amba...

.

Also to reclaim diskspace used by MySQL DB sometimes you might need to do "OPTIMIZE TABLE <tablename>" on the DB

https://www.percona.com/blog/2013/09/25/how-to-reclaim-space-in-innodb-when-innodb_file_per_table-is...


avatar
Explorer

@Jay Kumar SenSharma Thank you for your response.

Here is the requested output :

+--------------------+---------------+
| DB Name            | DB Size in MB |
+--------------------+---------------+
| ambaridb           |         651.8 |
| hivedb             |       30274.0 |
| information_schema |           0.1 |
| mysql              |           0.6 |
| ooziedb            |        7041.9 |
| performance_schema |           0.0 |
| rangerdb           |          29.4 |
+--------------------+---------------+

I have few followup questions:

1. If we delete the data from the table will ibdata1 file size will shrink?

2. Can we increase the disk size for that mount point where the ibdata1 file resides? what steps we will need to take?

appreciate your help in this!

avatar
Explorer

@Jay Kumar SenSharma ... Can you please suggest based upon the requested information?

avatar
Explorer

@Jay Kumar SenSharma - Can you please suggest based upon the requested information?

avatar
Master Mentor

@Madhura Mhatre

From the output, you can clearly see that its the Hive database that has grown. Having said that @Jay Kumar SenSharma's solution would work if it was the Ambari database that was huge where you can purge the history which is not the case. You cannot purge the hive database as you will lose data but you can create a more compact table with CTAS see below. The end result you would have manageable table size on disk


Option 1

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
create table if not exists t1 (a int, b int) partitioned by (c int);       -- your original table
create table t1orc (a int, b int) partitioned by (c int) stored as ORC;    -- your compressed table
insert into table t1orc partition(c) select a, b, c from t1;

CTAS has these restrictions:

The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.

Option 2

The other solution is to change the location and increase the size of the mount point be aware to maintain the same path as the metadata maintains that record in the Hive metastore

So you will need to follow update the location as documented here

Hope that helps

avatar
Explorer

Hi @Geoffrey Shelton Okot,

Thank you for your response. Yes, we currently have that ibdata file on a disk- which we believe we can increase the size of the of the same disk. if we change the size of the same disk then do we need to update the location anywhere? As we are not changing location of the ibdata file. is that correct?

Please suggest.

avatar
Master Mentor

@Madhura Mhatre

If you increase the size of the same ibdata disk mount then you don't need to update any metadata because the pointers will be intact in the metastore.

Make sure you shut down the all the databases on the mount point before increasing the size.

Happy hadooping