Created 05-12-2019 10:19 PM
Hi ,
We are using mysql for our cluster database. Currently we are seeing size of
-rw-rw----. 1 mysql mysql 37G May 1 18:55 ibdata1
This database has - ambaridb/rangerdb/hivedb etc.
is increasing day by day. and we don't enough space on the disk.. Is there any easy way where we can reduce the size of this mysql db? Or is there a way if we can move this to some other disk? or increase the size of this mount point?
Please need suggestion
Created 05-14-2019 06:49 PM
From the output, you can clearly see that its the Hive database that has grown. Having said that @Jay Kumar SenSharma's solution would work if it was the Ambari database that was huge where you can purge the history which is not the case. You cannot purge the hive database as you will lose data but you can create a more compact table with CTAS see below. The end result you would have manageable table size on disk
Option 1
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; create table if not exists t1 (a int, b int) partitioned by (c int); -- your original table create table t1orc (a int, b int) partitioned by (c int) stored as ORC; -- your compressed table insert into table t1orc partition(c) select a, b, c from t1;
CTAS has these restrictions:
The target table cannot be a partitioned table. The target table cannot be an external table. The target table cannot be a list bucketing table.
Option 2
The other solution is to change the location and increase the size of the mount point be aware to maintain the same path as the metadata maintains that record in the Hive metastore
So you will need to follow update the location as documented here
Hope that helps
Created 05-12-2019 10:51 PM
Can you please run the following query inside your MySQL DB so that we can findout which database ( ambaridb/rangerdb/hivedb) is actually utilizing more memory? Please share the output.
mysql> SELECT table_schema "DB Name", ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB" FROM information_schema.tables GROUP BY table_schema;
.
then accordingly we can check if there is a possibility to clear some unwanted data. Like in case of ambaridb it may be the old unwanted alerts and operational data that might be consuming most of the db space. In such cases Ambari Provides a utility to purge the historical unwanted data:
# ambari-server db-purge-history --cluster-name YOUR_CLUSTER_NAME --from-date 2018-08-01
.
Also to reclaim diskspace used by MySQL DB sometimes you might need to do "OPTIMIZE TABLE <tablename>" on the DB
Created 05-12-2019 10:59 PM
@Jay Kumar SenSharma Thank you for your response.
Here is the requested output :
+--------------------+---------------+ | DB Name | DB Size in MB | +--------------------+---------------+ | ambaridb | 651.8 | | hivedb | 30274.0 | | information_schema | 0.1 | | mysql | 0.6 | | ooziedb | 7041.9 | | performance_schema | 0.0 | | rangerdb | 29.4 | +--------------------+---------------+
I have few followup questions:
1. If we delete the data from the table will ibdata1 file size will shrink?
2. Can we increase the disk size for that mount point where the ibdata1 file resides? what steps we will need to take?
appreciate your help in this!
Created 05-13-2019 01:47 PM
@Jay Kumar SenSharma ... Can you please suggest based upon the requested information?
Created 05-14-2019 05:53 PM
@Jay Kumar SenSharma - Can you please suggest based upon the requested information?
Created 05-14-2019 06:49 PM
From the output, you can clearly see that its the Hive database that has grown. Having said that @Jay Kumar SenSharma's solution would work if it was the Ambari database that was huge where you can purge the history which is not the case. You cannot purge the hive database as you will lose data but you can create a more compact table with CTAS see below. The end result you would have manageable table size on disk
Option 1
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; create table if not exists t1 (a int, b int) partitioned by (c int); -- your original table create table t1orc (a int, b int) partitioned by (c int) stored as ORC; -- your compressed table insert into table t1orc partition(c) select a, b, c from t1;
CTAS has these restrictions:
The target table cannot be a partitioned table. The target table cannot be an external table. The target table cannot be a list bucketing table.
Option 2
The other solution is to change the location and increase the size of the mount point be aware to maintain the same path as the metadata maintains that record in the Hive metastore
So you will need to follow update the location as documented here
Hope that helps
Created 05-14-2019 07:02 PM
Thank you for your response. Yes, we currently have that ibdata file on a disk- which we believe we can increase the size of the of the same disk. if we change the size of the same disk then do we need to update the location anywhere? As we are not changing location of the ibdata file. is that correct?
Please suggest.
Created 05-14-2019 07:40 PM
If you increase the size of the same ibdata disk mount then you don't need to update any metadata because the pointers will be intact in the metastore.
Make sure you shut down the all the databases on the mount point before increasing the size.
Happy hadooping