Support Questions

Find answers, ask questions, and share your expertise

HIve location

Contributor

Location clause is Mandatory in create table statement in External tables and not required with External partiotioned table.what about managed table and Managed partiotioned tables.Is it mandatory or optional?

8 REPLIES 8

Guru

LOCATION is not needed for managed tables.

Loading data into managed tables

1. You can create a managed table and load data into it during table creation using

LOAD DATA INPATH 'hdfs_file_or_directory_path' [OVERWRITE] INTO TABLE tablename
  [PARTITION (partcol1=val1, partcol2=val2 ...)]

This is similar to using LOCATION but not the same

2. You could also create a table and load data into it during table creation using CTAS (Create Table as Select). This demonstrates it nicely:

http://www.dummies.com/programming/big-data/hadoop/how-to-use-hives-create-table-as-select-ctas/

HDFS location of data in managed tables

Note that data loaded into managed hive tables will be located in: /user/hive/warehouse/<databasename>.db/<tablename> where <> represent the actual names you are using.

You can change this by going to hive-site.xml and changing hive.metastore.warehouse.dir to the path you want. (It is set to /user/hive/warehouse by default). Note: you should make all config changes through the Ambari UI.

Hive DDL ref

For full Hive CREATE TABLE DDL, see:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Tr...

If this is what you are looking for, let me know by accepting the answer. Else, let me know any gaps.

Contributor

LOCATION is never mandatory, and can be used with any combination of managed, external and partitioned tables.

The following statements are all valid:

create database if not exists test;
use test;
-- no LOCATION
create table t1 (i int); 
create EXTERNAL table t2(i int); 
create table t3(i int) PARTITIONED by (b int); 
create EXTERNAL table t4(i int) PARTITIONED by (b int); 
-- with LOCATION
create table t5 (i int) LOCATION '/tmp/tables/t5'; 
create EXTERNAL table t6(i int)  LOCATION '/tmp/tables/t6'; 
create table t7(i int) partitioned by (b int)  LOCATION '/tmp/tables/t7'; 
create EXTERNAL table t8(i int) partitioned by (b int)  LOCATION '/tmp/tables/t8'; 

show tables;
drop table t1; drop table t2; drop table t3; drop table t4;
drop table t5; drop table t6; drop table t7; drop table t8; 

If LOCATION is not specified, hive will use the value of hive.metastore.warehouse.dir in all cases. With the above example:

hdfs dfs -ls /apps/hive/warehouse/test.db
Found 4 items
drwxrwxrwx   - hive hdfs          0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t1
drwxrwxrwx   - hive hdfs          0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t2
drwxrwxrwx   - hive hdfs          0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t3
drwxrwxrwx   - hive hdfs          0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t4

Note how t2 and t4 were both external tables.

hdfs dfs -ls /tmp/tables
Found 4 items
drwxrwxrwx   - hive hdfs          0 2016-10-29 01:00 /tmp/tables/t5
drwxrwxrwx   - hive hdfs          0 2016-10-29 01:00 /tmp/tables/t6
drwxrwxrwx   - hive hdfs          0 2016-10-29 01:00 /tmp/tables/t7
drwxrwxrwx   - hive hdfs          0 2016-10-29 01:00 /tmp/tables/t8

Jean-Philippe is correct - you can place Internal and External tables to any location you wish to.

But IMHO it is very wise to maintain the default convention - Keep your internal (managed) tables in the /apps/hive/warehouse location, and your external tables away from the /apps/hive/warehouse location. You do not want to have to guess if the underlying tables will or will not be deleted when you drop a table. By keeping the Internal and External locations separated... you will know.

Contributor

1)Thanks for Input.The following external table declaration creates an external table that can read all the data files for this comma-delimited data in /data/stocks:

CREATE EXTERNAL TABLE IF NOT EXISTS stocks ( exchange STRING, symbol STRING, ymd STRING, price_open FLOAT, price_high FLOAT, price_low FLOAT, price_close FLOAT, volume INT, price_adj_close FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/data/stocks';

2)The following two tables(t2 and t4) will read the data from which location?I mean previosly stocks table is reading from /data/stocks.what about t2 and t4.From where it will read?

create EXTERNAL table t2(i int);

create EXTERNAL table t4(i int) PARTITIONED by (b int);

In that example if you do not specify a location for tables t2 and t4 - those 2 external tables would be created in /apps/hive/warehouse and then the next folder would be the database you were using (perhaps default) and then the table name.

So they would be found in:

/apps/hive/warehouse/default.db/t2 
/apps/hive/warehouse/default/db/t4/b=1 
/apps/hive/warehouse/default/db/t4/b=2  ... 

(where b is the partition column)

Contributor

a)Thanks for input. when i use below sql statement on table t2 It will display the data present in location:-/apps/hive/warehouse/default.db/t2 .Please correct me if i am wrong

select * from t2;

b)when i use below sql statement on table t4 It will display the data present in location:-/apps/hive/warehouse/default.db/t4/b=1 .Please correct me if i am wrong

select * from t4 where b='1';

If answer is Yes for both the questions how to load the data in above paths.

You are correct. The answer is 'Yes' to both of your questions. You can load data into the tables by adding a file to the directory (as log as the file has the correct schema), by using LOAD DATA INPATH, or by using the standard INSERT INTO <tablename> select * FROM <othertablename> syntax. Just like any other Hive table.

Contributor

Thanks to everybody for clarifying my doubts.