Created on 08-03-2017 08:41 PM
Goal: Demonstrate how to change the database location in HDFS and Metastore
There are circumstances wherein we can consider moving the database location. By default, the location for default and custom databases is defined within the value of hive.metastore.warehouse.dir, which is /apps/hive/warehouse.
Here are the illustrated steps to change a custom database location, for instance "dummy.db", along with the contents of the database.
[hive@xlautomation-2 ~]$ beeline -u "jdbc:hive2://xlautomation-2.h.c:10000/default;principal=hive/xlautomation-2.h.c@H.C" 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create database dummy; No rows affected (0.394 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy; +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ | db_name | comment | location | owner_name | owner_type | parameters | +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ | dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db | hive | USER | | +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ 1 row selected (0.561 seconds)
NOTE: The example provides the database location i.e. /apps/hive/warehouse/dummy.db which needs to be updated.
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create table dummy.test123 (col1 string, col2 string) row format delimited fields terminated by ',' stored as textfile; No rows affected (0.691 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> insert into dummy.test123 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6); INFO : Session is already open INFO : Dag name: insert into dummy.tes...3),(4,4),(5,5),(6,6)(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0034) INFO : Loading data to table dummy.test123 from hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db/test123/.hive-staging_hive_2017-08-03_16-20-11_965_647196527379814552-1/-ext-10000 INFO : Table dummy.test123 stats: [numFiles=1, numRows=6, totalSize=24, rawDataSize=18] No rows affected (2.47 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123; +---------------+---------------+--+ | test123.col1 | test123.col2 | +---------------+---------------+--+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | +---------------+---------------+--+ 6 rows selected (0.451 seconds)
[hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db [hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db [hive@xlautomation-2 ~]$ hdfs dfs -chmod 777 /apps/hive/warehouse/newdummy.db
[hive@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse | egrep dummy.db drwxrwxrwx - hive hdfs 0 2017-08-03 16:19 /apps/hive/warehouse/dummy.db drwxrwxrwx - hive hdfs 0 2017-08-03 16:27 /apps/hive/warehouse/newdummy.db
[hive@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/
Caution: The usage of "cp" with "p" to preserve the permission is prone to the following error
cp: Access time for hdfs is not configured. Please set dfs.namenode.accesstime.precision configuration parameter.
This is because the value of dfs.namenode.accesstime.precision is set to 0 by default, in hortonworks HDP distribution. Since this is a client level configuration, it can be configured in hdfs-site.xml on a non-ambari managed cluster in client i.e., from 0 to 3600000. We can verify this at the client level by running the following command.
[hive@xlautomation-2 ~]$ hdfs getconf -confKey dfs.namenode.accesstime.precision 3600000
[hdfs@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/
[hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/dummy.db/ Found 1 items drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/dummy.db/test123 [hdfs@xlautomation-2 ~]$ [hdfs@xlautomation-2 ~]$ [hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/newdummy.db/ Found 1 items drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/newdummy.db/test123
mysql> update SDS set location= replace(location,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where location like '%dummy.db%'; Query OK, 3 rows affected (0.53 sec) Rows matched: 3 Changed: 3 Warnings: 0 mysql> update DBS set db_location_uri= replace(db_location_uri,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where db_location_uri like '%dummy.db%'; Query OK, 1 row affected (0.06 sec) Rows matched: 1 Changed: 1 Warnings: 0
NOTE: If you want to try and run this before committing the changes in metastore, use begin; before and end; after your UPDATE statements. This update statement will replace all the occurrences of specified string within DBS and SDS tables.
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy; +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ | db_name | comment | location | owner_name | owner_type | parameters | +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ | dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db | hive | USER | | +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ 1 row selected (0.444 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe formatted dummy.test123; +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ | col_name | data_type | comment | +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ | # col_name | data_type | comment | | | NULL | NULL | | col1 | string | | | col2 | string | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | dummy | NULL | | Owner: | hive | NULL | | CreateTime: | Thu Aug 03 16:19:33 UTC 2017 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Location: | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db/test123 | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} | | | numFiles | 1 | | | numRows | 6 | | | rawDataSize | 18 | | | totalSize | 24 | | | transient_lastDdlTime | 1501777214 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | Storage Desc Params: | NULL | NULL | | | field.delim | , | | | serialization.format | , | +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ 33 rows selected (0.362 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123; +---------------+---------------+--+ | test123.col1 | test123.col2 | +---------------+---------------+--+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | +---------------+---------------+--+ 6 rows selected (0.275 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa>
Considerations
mysql> show grants for hive; +--------------------------------------------------------------------------------------------------------------+ | Grants for hive@% | +--------------------------------------------------------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY PASSWORD '*7ACE763ED393514FE0C162B93996ECD195FFC4F5' | | GRANT ALL PRIVILEGES ON `hive`.* TO 'hive'@'%' | +--------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.02 sec) mysql> select user,host from user; +------+--------------------+ | user | host | +------+--------------------+ | hive | % | | root | 127.0.0.1 | | root | localhost | | root | xlautomation-2.h.c | +------+--------------------+ 4 rows in set (0.00 sec)