Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar

To demonstrate what is happening here, see these steps and output.

  1. Create a database at a custom location:

 

0: jdbc:hive2://hostname.cloudera.co> create database grandtour
location 'hdfs://hostname.cloudera.com:8020/bdr-test/grandtour.db';
<TRUNCATED>
INFO : Executing command(queryId=hive_20210818180811_d96dd7f8-2713-440f-8e9f-
8eebd2954d05): create database grandtour location 'hdfs://hostname.cloudera.com:8020/bdr-test/grandtour.db'
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20210818180811_d96dd7f8-2713-
440f-8e9f-8eebd2954d05); Time taken: 0.031 seconds
INFO : OK
No rows affected (0.083 seconds)

 

  • Create a table and insert a record:

 

0: jdbc:hive2://hostname.cloudera.co> use grandtour;
<TRUNCATED>
INFO : Completed executing command(queryId=hive_20210818180835_f365f042-e2a9-
4f53-a9ed-317d21dcfc07); Time taken: 0.008 seconds
INFO : OK
No rows affected (0.065 seconds)

0: jdbc:hive2://hostname.cloudera.co> create table madagascar (name
string);
<TRUNCATED>
INFO : Completed executing command(queryId=hive_20210818180858_79bb34bf-b703-
43c9-a720-4756c22cb661); Time taken: 0.053 seconds
INFO : OK
No rows affected (0.117 seconds)

0: jdbc:hive2://hostname.cloudera.co> insert into madagascar
values('james may');
<TRUNCATED>
INFO : Completed executing command(queryId=hive_20210818181038_83ea0501-5d24-
4c57-bdde-2e214e7abb9c); Time taken: 19.26 seconds
INFO : OK
1 row affected (19.47 seconds)

 

  • Table contents:

 

0: jdbc:hive2://hostname.cloudera.co> select * from madagascar;
<TRUNCATED>
INFO : Completed executing command(queryId=hive_20210818181134_346ee6b8-0c2e-
4eae-9ee9-e88d94aa6b3e); Time taken: 0.001 seconds
INFO : OK
+------------------+
| madagascar.name |
+------------------+
| james may |
+------------------+
1 row selected (0.421 seconds)
0: jdbc:hive2://hostname.cloudera.co>

 

  • HDFS listing in CDH 6.x:

 

[hdfs@c441-node4 ~]$ hdfs dfs -ls /bdr-test
Found 1 items
drwxrwxrwx - hive supergroup 0 2021-08-18 18:08 /bdr-test/grandtour.db

[hdfs@c441-node4 ~]$ hdfs dfs -ls /bdr-test/grandtour.db
Found 1 items
drwxrwxrwx - hive supergroup 0 2021-08-18 18:08 /bdrtest/
grandtour.db/madagascar

[hdfs@c441-node4 ~]$ hdfs dfs -ls /bdr-test/grandtour.db/madagascar
Found 1 items
-rwxrwxrwx 3 hive supergroup 10 2021-08-18 18:10 /bdrtest/
grandtour.db/madagascar/000000_0

[hdfs@c441-node4 ~]$ hdfs dfs -cat /bdr-test/grandtour.db/madagascar/000000_0
james may

 

  • Run a Hive BDR job in CDP cluster. After BDR, on the destination cluster:

 

0: jdbc:hive2://hostname.cloudera.co> show databases;
<TRUNCATED>
INFO : Completed executing command(queryId=hive_20210818185431_6c2ad328-78c4-
454d-bead-3aa7baae907e); Time taken: 0.007 seconds
INFO : OK
+---------------------+
| database_name |
+---------------------+
| default |
| grandtour |
| information_schema |
| sys |
+---------------------+
4 rows selected (0.044 seconds)

 

  • Describe output shows a different location. It should be /bdr-test, but it shows the default location.
    diifLocation.png

Even though describe shows a wrong location, the table is at the correct location on HDFS. The listing looks as follows:

 

 

 

[root@c241-node3 ~]# hdfs dfs -ls /
Found 7 items
drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdr-test
drwxrwxrwx - hbase hbase 0 2021-08-18 18:29 /hbase
drwxrwxrwx - hdfs supergroup 0 2021-08-18 01:14 /ranger
drwxrwxrwx - solr solr 0 2021-08-18 01:14 /solr-infra
drwxrwxrwx - hdfs supergroup 0 2021-08-18 18:25 /tmp
drwxrwxrwx - hdfs supergroup 0 2021-08-18 18:25 /user
drwxrwxrwx - hdfs supergroup 0 2021-08-18 01:14 /warehouse

[root@c241-node3 ~]# hdfs dfs -ls /bdr-test
Found 1 items
drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdr-test/grandtour.db

[root@c241-node3 ~]# hdfs dfs -ls /bdr-test/grandtour.db
Found 1 items
drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdrtest/
grandtour.db/madagascar

[root@c241-node3 ~]# hdfs dfs -ls -R /bdr-test/grandtour.db
drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdrtest/
grandtour.db/madagascar
-rwxrwxrwx 3 hive supergroup 10 2021-08-18 18:10 /bdrtest/
grandtour.db/madagascar/000000_0

[root@c241-node3 ~]# hdfs dfs -cat /bdr-test/grandtour.db/madagascar/000000_0
james may

 

 

 

The reason the HDFS listing gets created this way is that this is the table location.

create_database call from client is coming with "locationUri:/warehouse/tablespace/managed/hive/grandtour.db"
create_table call from client is coming with "location:/bdr-test/grandtour.db/madagascar"
 
You can verify this in the Hive metastore log. Note the locationURI in these messages.
For database:

 

 

 

2021-08-18 18:40:47,421 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-7-thread-196]: 203: source:172.xx.xx.xx create_database: Database(name:grandtour, description:null, locationUri:/warehouse/tablespace/managed/hive/grandtour.db, parameters:{}, ownerName:hive, ownerType:USER, catalogName:hive, createTime:1629310091)

 

 

 

For table:

 

 

 

2021-08-18 18:40:47,563 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-7-thread-198]: 205: source:172.xx.xx.xx create_table_req: Table(tableName:madagascar, dbName:grandtour, owner:hive, createTime:1629310138, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null)], location:/bdr-test/grandtour.db/madagascar, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{external.table.purge=true, numRows=1, rawDataSize=9, transient_lastDdlTime=1629310257, numFilesErasureCoded=0, totalSize=10, EXTERNAL=TRUE, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=1}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER)

 

 

 

Now, how to fix the database location? This issue has been resolved in Cloudera Manager 7.4.4.

611 Views
0 Kudos