Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hive table occupies physical memory ?

hive table occupies physical memory ?

Expert Contributor

i have one .csv file in HDFS size of 100 MB. i am uploading this file in HIVE database table for processing. this table path in HDFS is hive/apps/warehouse. so it occupies another 100 MB physical storage ? means 100MB(HDFS)+100MB(HIVE)

1 REPLY 1

Re: hive table occupies physical memory ?

@heta desai

There are two kinds of Hive tables that you may use:

1) Hive Managed Table

After creating your table (let's call it "test_table") you will load the data into it using the following syntax:

LOAD DATA INPATH '/hdfs_path/to/file/mycsv.csv' OVERWRITE INTO TABLE test_table

This will actually move the file to the Hive warehouse directory. So there will only be one copy, which will exist in the warehouse folder.

You can also use the below syntax:

LOAD DATA LOCAL INPATH '/local_filesystem/path/mycsv.csv' OVERWRITE INTO TABLE test_table

This will load the data directly from your local filesystem, not HDFS.

2) Hive External Table

When creating your table you would define it as follows:

CREATE EXTERNAL TABLE test_table (
  column1 type,
  column2 type
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/hdfs_path/to/file/mycsv.csv';

This will leave the file in it's place but will create the metadata in the metastore and point to this file. So, again, there is only one copy of the file.

You can also create a Managed Table from data that exists in another table (external, managed or even a view). This is done using the following syntax:

CREATE TABLE new_table AS SELECT column1, column2, column3 FROM test_table

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsS...

So what's the difference between Managed and External Tables?

The main difference is that when you use a managed table, HiveServer manages both the metadata in the metastore as well as the underlying data (which it moved to the warehouse directory). If you drop the table, then the info in the metastore is removed and the data on the filesystem is deleted as well.

With External Tables, HiveServer only manages the metdata in the metastore. If you drop the table, only the metadata in the Hive Metastore is removed. The actual files/data, however, stays on HDFS.

The below link provides more information and examples:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ManagedandExte...

Don't have an account?
Coming from Hortonworks? Activate your account here