Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Guru

Goal: Demonstrate how to change the database location in HDFS and Metastore

There are circumstances wherein we can consider moving the database location. By default, the location for default and custom databases is defined within the value of hive.metastore.warehouse.dir, which is /apps/hive/warehouse.

Here are the illustrated steps to change a custom database location, for instance "dummy.db", along with the contents of the database.

  • Verify the details of the database we would like to move to a new location
[hive@xlautomation-2 ~]$ beeline -u "jdbc:hive2://xlautomation-2.h.c:10000/default;principal=hive/xlautomation-2.h.c@H.C"
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create database dummy;
No rows affected (0.394 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy;
+----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+
| db_name  | comment  |                           location                           | owner_name  | owner_type  | parameters  |
+----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+
| dummy    |          | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db  | hive        | USER        |             |
+----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.561 seconds)

NOTE: The example provides the database location i.e. /apps/hive/warehouse/dummy.db which needs to be updated.

  • Verified the same using dummy table to test whether the location update was indeed successful
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create table dummy.test123 (col1 string, col2 string) row format delimited fields terminated by ',' stored as textfile;
No rows affected (0.691 seconds)

0: jdbc:hive2://xlautomation-2.h.c:10000/defa> insert into dummy.test123 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6);
INFO  : Session is already open
INFO  : Dag name: insert into dummy.tes...3),(4,4),(5,5),(6,6)(Stage-1)
INFO  : Status: Running (Executing on YARN cluster with App id application_1499274604190_0034)
INFO  : Loading data to table dummy.test123 from hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db/test123/.hive-staging_hive_2017-08-03_16-20-11_965_647196527379814552-1/-ext-10000
INFO  : Table dummy.test123 stats: [numFiles=1, numRows=6, totalSize=24, rawDataSize=18]
No rows affected (2.47 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123;
+---------------+---------------+--+
| test123.col1  | test123.col2  |
+---------------+---------------+--+
| 1             | 1             |
| 2             | 2             |
| 3             | 3             |
| 4             | 4             |
| 5             | 5             |
| 6             | 6             |
+---------------+---------------+--+
6 rows selected (0.451 seconds)
  • Create a new storage DIR of our choice (we used newdummy.db) and replicate the permission at the directory level.
[hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db
[hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db
[hive@xlautomation-2 ~]$ hdfs dfs -chmod 777 /apps/hive/warehouse/newdummy.db
  • Verify if the DB (dir) level permissions are the same
[hive@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse | egrep dummy.db
drwxrwxrwx   - hive  hdfs          0 2017-08-03 16:19 /apps/hive/warehouse/dummy.db
drwxrwxrwx   - hive  hdfs          0 2017-08-03 16:27 /apps/hive/warehouse/newdummy.db
  • Copy all the underlying contents from /apps/hive/warehouse/dummy.db/ into the new directory
[hive@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/

Caution: The usage of "cp" with "p" to preserve the permission is prone to the following error

cp: Access time for hdfs is not configured.  Please set dfs.namenode.accesstime.precision configuration parameter.

This is because the value of dfs.namenode.accesstime.precision is set to 0 by default, in hortonworks HDP distribution. Since this is a client level configuration, it can be configured in hdfs-site.xml on a non-ambari managed cluster in client i.e., from 0 to 3600000. We can verify this at the client level by running the following command.

[hive@xlautomation-2 ~]$ hdfs getconf -confKey dfs.namenode.accesstime.precision 
3600000
  • Once the change is made, copy the contents of database folder /dummy.db/* to the new location i.e., /newdummy.db/ as HDFS user. We are overwriting (-f) any existing files within new directory and (-p) preserving the permissions
[hdfs@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/
  • Check the permissions once the copy is completed
[hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/dummy.db/
Found 1 items
drwxrwxrwx   - hive hdfs          0 2017-08-03 16:20 /apps/hive/warehouse/dummy.db/test123
[hdfs@xlautomation-2 ~]$ 
[hdfs@xlautomation-2 ~]$ 
[hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/newdummy.db/
Found 1 items
drwxrwxrwx   - hive hdfs          0 2017-08-03 16:20 /apps/hive/warehouse/newdummy.db/test123
  • With the privileged user access to metastore db (hive in our case) we may need to update three tables i.e., DBS, SDS and FUNC_RU as they log the locations for database, table and function in that order. In our example, since we do not have any functions, we will just update SDS and DBS tables
mysql> update SDS set location= replace(location,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db')  where location like '%dummy.db%';
Query OK, 3 rows affected (0.53 sec)
Rows matched: 3  Changed: 3  Warnings: 0


mysql> update DBS set db_location_uri= replace(db_location_uri,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db')  where db_location_uri like '%dummy.db%';
Query OK, 1 row affected (0.06 sec)
Rows matched: 1  Changed: 1  Warnings: 0

NOTE: If you want to try and run this before committing the changes in metastore, use begin; before and end; after your UPDATE statements. This update statement will replace all the occurrences of specified string within DBS and SDS tables.

  • Check if the changes made to the tables were permanent, the location should be updated to */newdummy.db
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy;
+----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+
| db_name  | comment  |                            location                             | owner_name  | owner_type  | parameters  |
+----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+
| dummy    |          | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db  | hive        | USER        |             |
+----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.444 seconds)
  • Verify the data from the table and also confirm its location
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe formatted dummy.test123;
+-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+
|           col_name            |                                data_type                                |           comment           |
+-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+
| # col_name                    | data_type                                                               | comment                     |
|                               | NULL                                                                    | NULL                        |
| col1                          | string                                                                  |                             |
| col2                          | string                                                                  |                             |
|                               | NULL                                                                    | NULL                        |
| # Detailed Table Information  | NULL                                                                    | NULL                        |
| Database:                     | dummy                                                                   | NULL                        |
| Owner:                        | hive                                                                    | NULL                        |
| CreateTime:                   | Thu Aug 03 16:19:33 UTC 2017                                            | NULL                        |
| LastAccessTime:               | UNKNOWN                                                                 | NULL                        |
| Protect Mode:                 | None                                                                    | NULL                        |
| Retention:                    | 0                                                                       | NULL                        |
| Location:                     | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db/test123  | NULL                        |
| Table Type:                   | MANAGED_TABLE                                                           | NULL                        |
| Table Parameters:             | NULL                                                                    | NULL                        |
|                               | COLUMN_STATS_ACCURATE                                                   | {\"BASIC_STATS\":\"true\"}  |
|                               | numFiles                                                                | 1                           |
|                               | numRows                                                                 | 6                           |
|                               | rawDataSize                                                             | 18                          |
|                               | totalSize                                                               | 24                          |
|                               | transient_lastDdlTime                                                   | 1501777214                  |
|                               | NULL                                                                    | NULL                        |
| # Storage Information         | NULL                                                                    | NULL                        |
| SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                      | NULL                        |
| InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat                                | NULL                        |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat              | NULL                        |
| Compressed:                   | No                                                                      | NULL                        |
| Num Buckets:                  | -1                                                                      | NULL                        |
| Bucket Columns:               | []                                                                      | NULL                        |
| Sort Columns:                 | []                                                                      | NULL                        |
| Storage Desc Params:          | NULL                                                                    | NULL                        |
|                               | field.delim                                                             | ,                           |
|                               | serialization.format                                                    | ,                           |
+-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+
33 rows selected (0.362 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123;
+---------------+---------------+--+
| test123.col1  | test123.col2  |
+---------------+---------------+--+
| 1             | 1             |
| 2             | 2             |
| 3             | 3             |
| 4             | 4             |
| 5             | 5             |
| 6             | 6             |
+---------------+---------------+--+
6 rows selected (0.275 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa>

Considerations

  • Remove the old database directory only when you are sure the tables are readable
  • To check if hive or other privileged user has access to modify contents in metastore database, login to mysql and run the following commands (ensure that you are logged on to the node that hosts metastore database)
mysql> show grants for hive;
+--------------------------------------------------------------------------------------------------------------+
| Grants for hive@%                                                                                            |
+--------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY PASSWORD '*7ACE763ED393514FE0C162B93996ECD195FFC4F5' |
| GRANT ALL PRIVILEGES ON `hive`.* TO 'hive'@'%'                                                               |
+--------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)

mysql> select user,host from user;
+------+--------------------+
| user | host               |
+------+--------------------+
| hive | %                  |
| root | 127.0.0.1          |
| root | localhost          |
| root | xlautomation-2.h.c |
+------+--------------------+
4 rows in set (0.00 sec)

  • All the operations mentioned above was performed on a kerberized cluster
  • hive --service metatool -updateLocation did not succeed in updating the location, it is successful when changing the namenode uri to HA short name configuration
  • For any external tables whose locations are different, it should ideally not affect its access.
  • Copy output of "hdfs dfs -ls -R /apps/hive/warehouse/dummy.db" to ensure that you have a copy of the permissions before getting rid of the directory.
29,330 Views